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Abstract 


The  relative  power  of  several  computational  models  is  considered.  These  models  are  the  Turing 
machine  and  its  multidimensional  variant,  the  random  access  machine  (RAM)  ,  the  tree  ma¬ 
chine,  and  the  pointer  machine.  The  basic  computational  properties  of  the  pointer  machine  are 
examined  in  more  detail.  For  example,  time  and  space  hierarchy  theorems  for  pointer  machines 
are  presented. 

Every  Turing  machine  of  time  complexity  t  and  space  complexity  s  can  be  simulated  by  a 
pointer  machine  of  time  complexity  0(t )  using  0(s/logs)  nodes.  This  strengthens  a  similar 
result  by  van  Emde  Boas  (1989).  Every  alternating  pointer  machine  of  time  complexity  t  can  be 
simulated  by  a  deterministic  pointer  machine  using  0{t/\ogt)  nodes.  Other  results  concerning 
nondeterministic  and  alternating  pointer  machines  are  presented. 

Every  tree  machine  of  time  complexity  t  can  be  simulated  on-line  by  a  log-cost  RAM  of 
time  complexity  O((ilog<)/loglogt).  This  simulation  is  shown  to  be  optimal  using  the  notion 
of  incompressibility  from  Kolmogorov  complexity  (Solomonoff,  196-1;  Kolmogorov,  1965). 

Every  d-dimensional  Turing  machine  of  time  complexity  t  can  be  simulated  on-line  by  a  log- 
cost  RAM  running  in  time  0(<(log<)1_(1/,{i^(loglogt)1/,<f).  There  is  a  log-cost  RAM  R  running  in 
time  t  such  that  every  d-dimensional  Turing  machine  requires  time  /(log<(loglog  t)1+(1/£0)) 

to  simulate  R  on-line.  Every  unit-cost  RAM  of  time  complexity  t  can  be  simulated  on-line  by 
a  d-dimensional  Turing  machine  in  time  0(t(n)2  log  t(n)). 
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Chapter  1 


Introduction 


1.1  Background 

Since  the  introduction  of  the  Turing  machine,  computer  scientists  have  devised  many  abstract 
models  of  computation  to  study  different  aspects  of  computation.  They  began  by  adding 
features  to  the  Turing  machine  so  that  describing  a  particular  Turing  machine  program  was  not 
so  cumbersome.  Later,  models  with  random  access  storage  were  introduced  as  a  more  realistic 
alternative  to  the  Turing  machine.  More  exotic  models  have  appeared  in  the  literature  as  well. 

With  the  introduction  of  each  new  computational  model  there  comes  the  question  of  its 
quantitative  relationship  to  its  rivals.  It  is  clear  that  all  the  most  prominent  models  (excluding 
models  such  as  the  finite  state  automaton  and  the  pushdown  automaton)  accept  the  recursively 
enumerable  languages,  but  they  do  so  with  varying  degrees  of  time  and  space  efficiency. 

The  purpose  of  this  thesis  is  to  specify  relationships  between  models  of  computation.  In 
particular,  we  describe  the  relative  computational  power  between  specific  models  by  designing 
and  analyzing  simulations  of  one  model  by  another  model.  We  show  in  some  cases  that  there 
are  lower  bounds  to  the  speed  of  simulations  of  one  machine  by  another. 

1.2  Summary  of  Results 

The  models  we  consider  are  the  Turing  machine  and  its  multidimensional  variant,  the  random 
access  machine  (RAM),  the  tree  machine,  and  the  pointer  machine.  We  examine  in  more 
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detail  the  basic  computational  properties  of  the  pointer  machine,  a  newer  model  that  has  been 
neglected  in  the  literature  but  seems  to  have  some  interesting  properties. 

We  begin  our  review  of  the  pointer  machine  by  showing  that  space  compression  is  possible. 
We  then  describe  how  space  equivalence  of  pointer  machines  to  other  models  of  computation 
depends  on  the  definition  of  pointer  machine  space  complexity.  We  show  that  every  Turing 
machine  of  time  complexity  t  and  space  complexity  s  can  be  simulated  by  a  pointer  machine  of 
time  complexity  0(t )  using  O(s/logs)  nodes.  This  strengthens  a  similar  result  by  van  Emde 
Boas  (1989). 

We  give  time  and  space  hierarchy  theorems  for  pointer  machines.  With  respect  to  the  time 
and  space  hierarchies,  pointer  machines  are  similar  to  Turing  machines  and  RAMs.  We  show 
that  every  alternating  pointer  machine  of  time  complexity  t  can  be  simulated  by  a  deterministic 
pointer  machine  using  O(t/logt)  nodes.  We  present  other  results  concerning  nondeterministic 
and  alternating  pointer  machines. 

We  describe  how  every  tree  machine  of  time  complexity  t  can  be  simulated  on-line  by  a  log- 
cost  RAM  of  time  complexity  O((tlogf)/loglogt).  Using  the  notion  of  incompressibility  from 
Kolmogorov  complexity  (Li  and  Vitanyi,  1988),  we  show  that  our  simulation  method  is  optimal. 
This  appears  to  be  the  first  application  of  Kolmogorov  complexity  to  sequential  RAMs.  It  is 
significant  because  few  algorithms  have  been  shown  to  be  optimal. 

Using  similar  techniques,  we  show  that  every  <f-dimer.sional  Turing  machine  of  time  complex¬ 
ity  t  can  be  simulated  on-line  by  a  log-cost  RAM  running  in  time  O(t(log  t)1~(1/,<i)(log  log  i )x^d). 
For  d  =  1,  the  running  time  is  0(t  log  log  t),  which  is  the  same  as  the  result  of  Katajainen  et 
al.  (1988). 
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For  simulations  of  RAMs  by  multidimensional  Turing  machines,  we  show  that  there  is  a 
log-cost  RAM  R  running  in  time  t  such  that  every  d-dimensional  Turing  machine  requires 
time  /(log t (log log  j^+O/'O))  to  simulate  R  on-line.  We  describe  how  every  unit-cost 

RAM  of  time  complexity  t  can  be  simulated  on-line  by  a  d-dimensional  Turing  machine  in  time 
0(t(n)2  log  t(n)).  We  also  give  a  lower  bound  for  on-line  simulations  of  unit-cost  RAMs  by 
multidimensional  Turing  machines. 

1.3  Motivation 

Our  research  is  significant  for  a  number  of  reasons.  A  general  simulation  provides  automatic 
transformation  of  algorithms  for  one  model  into  algorithms  for  another  model.  This  is  especially 
important  when  one  of  the  models  is  a  Turing  machine  or  RAM,  since  these  machines  are  clearly 
the  standard  models  of  computation:  the  Turing  machine  for  its  historical  significance,  the  RAM 
because  it  most  closely  resembles  a  real  computer. 

Another  reason  to  examine  relationships  between  these  models  of  computation  is  to  de¬ 
termine  how  architectural  enhancements  (for  example,  multidimensional  tapes  on  a  Turing 
machine)  affect  the  speed  of  a  model. 

A  related  issue  is  how  alterations  in  the  definition  of  time  and  space  measures  for  a  model 
affect  the  model’s  complexity.  We  are  particularly  interested  in  the  tradeoff  between  log-cost 
RAMs  with  the  standard  instruction  set  and  unit-cost  RAMs  with  a  weaker  instruction  set 
(allowing  only  the  successor  function).  We  have  tried  to  determine  their  relationship  directly, 
but  we  have  also  examined  their  relationship  by  looking  at  how  they  each  relate  to  other  models. 

An  additional  motivation  for  research  into  simulations  between  these  models  concerns  the 
idea  of  data  structure  representation.  The  simulations  provide  insight  into  efficient  embeddings 
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between  data  structures.  For  instance,  an  optimal  simulation  of  a  multidimensional  Turing 
machine  by  a  RAM  may  indicate  how  best  to  represent  an  array  in  a  set  of  registers. 

It  is  clear  that  researchers  will  continue  to  use  the  models  investigated  in  this  thesis.  We 
feel  that  our  results  add  significantly  to  the  knowledge  base  of  computational  modeling. 

1.4  Overview 

In  Chapter  2,  we  introduce  the  machines  we  investigate,  providing  detailed  descriptions  of  each 
model,  as  well  as  definitions  of  their  time  and  space  complexities.  In  Chapter  3,  we  survey 
research  pertinent  to  our  study  and  review  previous  results  involving  these  models. 

We  investigate  the  pointer  machine  in  Chapter  4.  We  present  some  basic  computational 
properties  of  the  pointer  machine.  Chapter  4  also  contains  our  results  on  space  equivalence  of 
pointer  machines,  Turing  machines,  and  RAMs,  as  well  as  some  theorems  about  nondetermin- 
istic  and  alternating  pointer  machines. 

Chapter  5  features  our  on-line  simulation  of  a  tree  machine  by  a  log- cost  RAM.  We  de¬ 
scribe  the  simulation  and  show  that  it  is  optimal.  In  Chapter  6,  we  exhibit  on-line  simulations 
between  multidimensional  Turing  machines  and  RAMs,  in  both  directions,  and  present  some 
lower  bounds. 

In  Chapter  7,  we  present  open  problems  suggested  by  the  results  of  Chapters  4,  5,  and  6, 
and  we  describe  the  significance  of  their  solutions. 

A  preliminary  version  of  Chapter  4  appears  in  a  Coordinated  Science  Laboratory  technical 
report  (Luginbuhl  and  Loui,  1988).  Preliminary  versions  of  Chapter  5  and  Section  6.1  also 
appear  in  a  technical  report  (Loui  and  Luginbuhl,  1989). 
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Chapter  2 


Definitions  and  Notation 


In  this  chapter,  we  introduce  the  models  of  computation  we  consider.  We  also  describe  terms 
and  notation  we  use  throughout  the  thesis.  All  logarithms  in  this  thesis  are  taken  to  base  2. 
For  simplicity,  we  omit  floors  and  ceilings. 

2.1  Turing  Machines 

We  assume  the  reader  is  familiar  with  the  Turing  model,  its  basic  variations,  and  the  time  and 
space  measures  of  the  Turing  machine  as  described  by  Hopcroft  and  Ullman  (1979).  Because 
many  of  our  results  concern  multidimensional  Turing  machines,  we  describe  them  here  in  more 
detail. 

A  multihead  d-dimensional  Turing  machine  consists  of  a  finite  control  and  a  finite  number 
of  d-dimensional  worktapes,  each  with  at  least  one  worktape  head.  A  d-dimensional  worktape 
comprises  an  infinite  number  of  cells,  each  of  which  is  assigned  a  d-tuple  of  integers  called 
the  coordinates  of  the  cell;  for  instance,  the  coordinates  of  cell  i  are  (xi,i2, . . .  ,  x^).  The 
coordinates  of  adjacent  cells  differ  in  just  one  component  of  the  d-tuple  by  ±1.  Call  the  cell 
with  coordinates  (0,0,...,  0)  the  origin. 

At  each  step  of  the  computation,  the  machine  reads  the  symbols  in  the  currently  accessed 
input  and  worktape  cells,  (possibly)  writes  symbols  on  the  currently  accessed  output  and  work- 
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tape  cells,  (possibly)  shifts  the  input  head,  and  shifts  each  worktape  head  in  one  of  2d  +  1 
directions  -  either  to  one  of  2d  adjacent  cells  or  to  the  same  cell. 

A  box  is  a  set  of  cells  that  form  a  d-dimensional  cube.  The  volume  of  a  box  is  the  number  of 
cells  it  contains.  The  base  cell  of  a  box  is  the  cell  within  the  box  with  the  smallest  coordinates. 
The  distance  between  two  cells  is  the  sum  of  the  absolute  values  of  the  differences  between  their 
corresponding  coordinate  components;  this  is  sometimes  called  the  Ll  -distance  or  rectilinear 
distance. 

Lemma  2.1  Let  M  be  a  d-dimensional  Turing  machine  with  d>  2.  Let  a  worktape  of  M  have 
a  box  B  of  volume  v  with  the  origin  as  B ’s  base  cell,  and  let  x  be  a  cell  within  box  B.  If  the 
coordinates  of  x  are  written  on  a  separate  worktape,  then  M  can  access  cell  x  in  time  0(u1/,rf). 

Proof.  M  moves  a  worktape  head  to  the  origin  (the  boundaries  of  B  are  specially  marked 
so  that  the  origin  ca..  be  found).  M  then  moves  this  worktape  head  in  the  first  direction  the 
distance  specified  by  the  corresponding  component  of  z’s  coordinates,  decrementing  that 
component  as  the  head  moves  in  B.  M  does  the  same  for  each  component,  until  the  head 
arrives  at  cell  x.  Decrementing  a  counter  specifying  value  v'  takes  time  O(v').  Since  each  of 
x’s  d  components  specifies  a  value  of  at  most  v 1^d,  decrementing  each  component  takes  time  at 
most  dO(vl/d)  =  0(vl/d)  (since  d  is  a  constant).  The  distance  from  x  to  the  origin  is  at  most 
dvl/d,  so  M  moves  the  head  across  0(vl!d)  cells.  Thus  the  total  time  to  move  the  head  to  x  is 
0{vl'd).  □ 

2.2  Tree  Machines 

A  tree  machine,  a  generalization  of  a  Turing  machine,  has  a  storage  structure  that  consists  of 
a  finite  collection  of  complete  infinite  rooted  binary  trees,  called  tree  worktapes.  Each  cell  of  a 
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worktape  can  store  a  0  or  1.  Each  worktape  has  one  head.  A  worktape  head  can  shift  to  a  cell’s 
parent  or  to  its  left  or  right  child.  Initially,  every  worktape  head  is  on  the  root  of  its  worktape, 
and  all  cells  contain  0. 

Let  W  be  a  tree  worktape.  We  fix  a  natural  bijection  between  the  positive  integers  and  cells 
of  W.  We  refer  to  the  integer  corresponding  to  a  particular  cell  as  that  cell’s  location.  Write 
cell(6)  for  the  cell  at  location  b.  Define  cell(l)  as  the  root  of  W .  Then  cell(26)  is  the  left  child 
of  cell(6)  and  cell(26  +  1)  is  the  right  child  of  cell(f>). 

Each  step  of  a  tree  machine  consists  of  reading  the  contents  of  the  worktape  ceils  and  input 
cell  currently  scanned,  writing  back  on  the  same  worktape  cells  and  (possibly)  to  the  currently 
accessed  output  cell,  and  (possibly)  shifting  each  worktape  head  and  the  input  head.  When  the 
tree  machine  writes  on  the  output  tape,  it  also  shifts  the  output  head. 

The  time  complexity  t(n)  of  a  tree  machine  is  defined  in  the  natural  way.  The  space  com¬ 
plexity  s(n)  of  a  tree  machine  is  the  total  number  of  distinct  cells  visited  by  worktape  heads. 
Clearly,  every  tree  machine  of  space  complexity  s(n )  can  be  simulated  by  a  Turing  machine  of 
space  complexity  s(n),  and  vice-versa. 

The  depth  complexity  of  a  tree  machine  is  d(n )  if  every  worktape  head  remains  within 
distance  d  of  the  root  of  its  worktape  on  every  input  of  size  n.  It  is  possible  to  limit  the  depth 
complexity  of  a  tree  machine  with  respect  to  its  time  complexity: 

Theorem  2.2  (Paul  and  Reischuk,  1981;  Loui,  1984a)  Every  tree  machine  running  in  time 
t(n)  can  be  simulated  on-line  by  a  tree  machine  running  in  time  0(t(n))  and  depth  0(logt(n)). 
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2.3  Random  Access  Machines  (RAMs) 

The  random  access  machine  (RAM)  (Aho  et  al.,  1974;  Cook  and  Reckhow,  1973;  Katajainen  et 
al. ,  1988)  consists  of  the  following:  a  finite  sequence  of  labeled  instructions;  a  memory  consisting 
of  an  infinite  sequence  of  registers,  indexed  by  nonnegative  integer  addresses  (register  r(j)  has 
address  j );  and  a  special  register  AC,  called  the  accumulator,  used  for  operating  on  data.  Each 
register,  including  AC,  holds  a  nonnegative  integer;  initially  all  registers  contain  0.  Let  (i) 
denote  the  contents  of  register  r(x )  and  {AC)  denote  the  contents  of  AC.  Each  cell  on  the 
input  and  output  tapes  contains  a  symbol  from  a  finite  input/ouput  alphabet.  The  following 
RAM  instructions  are  allowed: 

input.  Read  the  current  input  symbol  into  AC  and  move  the  input  head  one  cell  to  the 
right. 

output.  Write  the  binary  representation  of  {AC)  onto  the  output  tape. 

jump  0.  Unconditional  transfer  of  control  to  instruction  labeled  6. 

jgtz  6.  Transfer  control  to  instruction  labeled  6  if  {AC)  >  0. 

load  —C.  Load  integer  C  into  AC. 

load  j.  Load  {j)  into  AC. 

load  *j.  (Load  indirect)  Load  {{j))  into  AC. 

store  j.  Store  {AC)  into  r(j). 

store  *j.  (Store  indirect)  Store  {AC)  into  register  r{{j)). 
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add  j.  Add  ( j )  to  (AC)  and  place  result  in  AC. 


sub  j.  If  (j)  >  (AC),  then  load  0  into  AC;  otherwise,  subtract  (j)  from  (AC)  and  place 
result  in  AC. 

The  length  of  a  nonnegative  integer  i  is  the  minimum  positive  integer  w  such  that  t  <  2W  -  1 
(approximately  the  logarithm  of  i ). 

We  consider  two  time  complexity  measures  for  RAMs,  based  on  the  cost  of  each  RAM 
instruction.  For  the  unit-cost  RAM,  we  charge  each  instruction  one  unit  of  time.  For  the 
log-cost  RAM,  we  charge  each  instruction  according  to  the  logarithmic  cost  criterion  (Cook 
and  Reckhow,  1973):  the  time  for  each  instruction  is  the  sum  of  the  lengths  of  the  integers 
(addresses  and  register  contents)  involved  in  its  execution.  The  time  complexity  t(n)  of  a  RAM 
is  the  maximum  total  time  used  in  computations  on  inputs  of  length  n.  It  is  possible,  of  course, 
to  define  time  complexity  in  other  ways;  e.g.,  we  could  charge  some  other  function  f(j)  for 
access  to  register  j  (Aggarwal  et  al.,  1987). 

In  our  simulations  involving  RAMs,  we  group  the  registers  into  a  finite  number  of  memories, 
each  memory  containing  an  infinite  number  of  registers.  This  does  not  increase  the  cost  in  time 
by  more  than  a  constant  factor,  since  we  could  simply  interleave  these  memories  into  one 
memory  (Katajainen  et  al.,  1988). 

We  discuss  the  space  complexity  of  RAMs  in  Section  3.5. 

Two  RAM  operations  used  often  in  this  thesis  are  the  pack  and  unpack  operations.  Let 
ri ,  r2»  •  •  • ,  ?b  be  contiguous  registers  in  RAM  R's  memory  containing,  respectively,  ij ,  i2, . . . ,  xj, 
where  each  Xj  is  a  single  bit.  R  packs  ri,  r2,. . .  ,  rj,  by  computing  the  single  6-bit  value  26~1xi  + 
26-2x2  +  .  .  ,-f  xt,  and  placing  this  value  into  the  accumulator  (see  Figure  2.1  for  an  example).  The 
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Figure  2.1:  Packing  six  bits  into  the  accumulator 

unpack  operation  is  the  inverse  of  the  pack  operation;  R  takes  a  single  value  in  the  accumulator 
and  stores  its  bits  into  contiguous  registers.  Each  operation  has  as  parameters  the  beginning 
and  ending  addresses  of  the  registers  involved  in  the  operation. 

We  use  a  technique  of  Katajainen  et  al.  (1988)  to  pack  and  unpack  registers.  This  divide- 
and-conquer  strategy  involves  precomputed  shift  tables: 


Lemma  2.3  (Katajainen  et  al.,  1988)  If  the  proper  tables  are  available,  then  it  is  possible  to 
pack  u  bits  into  the  accumulator,  and  to  unpack  a  u-bit  string  into  memory,  both  in  O(ulogu) 
time  on  a  log-cost  RAM. 


Lemma  2.4  (Katajainen  et  al.,  1988)  The  tables  necessary  for  Lemma  2.3  can  be  built  in 
0(u2u)  time  on  a  log-cost  RAM. 


2.4  Unit-cost  Successor  RAMs 


A  variation  of  the  RAM  that  we  consider  is  the  successor  RAM  (SRAM)  (see  Schonhage, 
1980).  The  SRAM  is  defined  almost  exactly  like  the  RAM,  except  that  instead  of  add  and 


subtract  instructions,  the  SRAM  has  the  successor  instruction,  which  adds  1  to  the  value  in  the 
accumulator,  and  the  empty  instruction,  which  loads  0  into  the  accumulator.  We  are  particularly 
interested  in  the  SRAM  with  the  unit-cost  time  measure,  because  of  the  relationship  between 
the  unit-cost  SRAM  and  the  pointer  machine  (see  Section  3.4). 

2.5  Pointer  Machines 

The  following  definition  for  pointer  machines  is  drawn  from  the  pointer  machine  definitions  of 
Schonhage  (1980)  and  Halpern  et  al.  (1986). 

A  A -structure,  which  provides  the  storage  for  the  pointer  machine,  is  a  directed  graph 
consisting  of  nodes  (vertices)  and  pointers  (edges).  Each  node  has  a  finite  number  of  outgoing 
pointers,  and  each  pointer  from  a  node  has  a  distinct  label.  The  labels  are  symbols  from  the 
finite  pointer  alphabet  A.  At  any  time,  one  node,  designated  the  center ,  is  used  to  access  the 
A-structure.  We  refer  to  the  center  node  as  xo. 

We  describe  an  instantaneous  configuration  of  the  A-structure  by  a  set  of  pointer  mappings: 
for  all  6  €  A,  there  is  a  pointer  mapping  ps  :  X  ->  X,  where  X  is  the  set  of  nodes;  ps( x)  =  y 
means  the  6  pointer  from  node  x  points  to  node  y.  From  these  pointer  mappings,  we  can 
recursively  define  the  mapping  p*  :  A‘  —*■  X: 

p'( A)  =  XO)  (where  X  is  the  empty  word) 

p‘(W6)  =  ps{pm{W)),  for  all  S  €  A,  W  G  A*. 

The  pointer  machine  also  has  a  separate  read-only  input  tape  containing  symbols  from  an 
input  alphabet  E.  For  simplicity,  we  consider  only  pointer  machines  that  accept  or  reject  their 
input.  With  the  addition  of  a  write-only  output  tape,  we  could  also  consider  pointer  machines 
that  produce  output. 
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A  pointer  machine  has  a  finite  sequence  of  program  instructions,  each  with  some  distinct 
instruction  label.  We  define  the  following  pointer  machine  instructions: 

accept.  Self-explanatory.  Computation  halts. 

reject.  Self-explanatory.  Computation  halts. 

create  W,  where  W  £  A*.  Create  a  new  node  x'  in  the  A-structure.  If  W  =  US,  where 
6  £  A  and  U  £  A*,  then  set  the  6  pointer  of  node  p“{U)  to  point  to  x' .  For  all  7  G  A,  p-y(x') 
is  set  to  io- 

center  W,  where  W  £  A*.  Make  the  node  p"{W)  the  new  center. 

assign  W  :=  V,  where  W,V  £  A*.  If  W  =  US,  then  set  the  6  pointer  of  p*(I/)  to  point  to 
Pm(V). 

if  W  —  V  go  to  p,  where  W,V  €  A*  and  p  is  an  instruction  label.  If  p*{W)  =  p“(V),  then 
pass  control  to  the  instruction  labeled  p.  Otherwise,  execute  the  next  instruction. 

if  input  —  a  go  to  p ,  where  p  is  an  instruction  label.  If  the  input  symbol  is  a,  then  pass 
control  to  the  instruction  labeled  p.  Otherwise,  execute  the  next  instruction. 

move  p,  where  p  £  {left,  right}.  Move  the  input  tape  head  one  square  in  the  direction 
indicated  by  p. 

The  pointer  machine  starts  with  the  input  head  on  the  leftmost  nonblank  input  symbol  and 
one  node  in  the  A-structure.  We  call  this  node,  which  is  the  center  when  computation  begins, 
the  initial  node. 
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The  time  consumed  by  the  pointer  machine  is  the  number  of  instructions  executed  before 
halting.  We  consider  both  unit-cost  and  logarithmic-cost  space  measures.  Define  mass  to  be 
the  number  of  create  instructions  executed  before  halting,  i.e.,  the  number  of  nodes  created 
during  execution.  Mass  was  introduced  as  a  measure  of  space  by  Halpern  et  at.  (1986).  Define 
the  capacity  of  a  computation  to  be  dnlog  n,  where  n  is  the  number  of  nodes  created,  and  d  is 
the  number  of  pointers  per  node  ( d  =  |A|). 

The  idea  for  considering  capacity  as  a  space  measure  comes  from  Borodin  et  al  (1981).  With 
n  nodes  there  are  at  most  ndn  possible  configurations  of  the  A-structure.  Borodin  et  al.  (1981) 
defined  control  space  (capacity)  as  log(Q),  where  Q  is  the  number  of  possible  configurations, 
so  our  definition  of  pointer  machine  capacity  is  reasonable.  Since  the  standard  definition  of 
Turing  machine  space  does  not  account  for  the  size  of  the  tape  alphabet,  we  could  also  define 
a  logarithmic  space  measure  for  pointer  machines  without  considering  the  size  of  the  pointer 
alphabet.  For  this  thesis,  however,  we  use  the  capacity  measure  as  defined. 

Proposition  2.5  Every  pointer  machine  of  mass  complexity  s(n)  has  capacity  complexity 
0(s(n) log  s(n))  and  every  pointer  machine  of  capacity  complexity  s(n)  has  mass  complexity 
0(s(n)/logs(n)). 

2.6  Nondeterministic  and  Alternating  Machines 

Although  most  of  our  results  involve  sequential  models  of  computation,  there  are  a  few  theo¬ 
rems  about  nondeterministic  and  alternating  machines.  We  assume  the  reader  is  familiar  with 
nondeterministic  Turing  machines  as  described  by  Hopcroft  and  Ullman  (1979). 

The  concept  of  alternation  (Chandra  et  al. ,  1981)  builds  on  the  idea  of  nondeterminism. 
In  an  alternating  Turing  machine,  as  in  a  nondeterministic  Turing  machine,  each  configuration 
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can  reach  several  successor  configurations  in  one  step.  Each  state  is  universal  or  existential.  A 
configuration  of  the  alternating  machine  is  universal  if  the  state  is  universal,  and  it  is  existential 
if  the  state  is  existential.  An  existential  configuration  is  accepting  if  at  least  one  of  its  successor 
configurations  is  accepting.  A  universal  configuration  is  accepting  only  if  all  of  its  successor 
configurations  are  accepting.  A  configuration  with  no  successor  configuration  is  accepting  if  the 
state  is  accepting,  and  vice-versa.  Thus  a  nondeterministic  Turing  machine  is  an  alternating 
Turing  machine  with  all  existential  configurations. 

Nondeterministic  and  alternating  pointer  machines  are  defined  analogously. 

2.7  Miscellaneous  Definitions 

Let  resource i  be  time  or  space  and  resource-2  be  time  or  space.  We  say  that  machine  M  of 
resource^  complexity  r  is  simulated  by  machine  M'  in  resource 2  /(r)  if  for  every  input  string, 
M'  produces  the  same  output  as  M  in  resource-}  /(r).  We  call  M'  the  host  and  M  the  guest  in 
the  simulation. 

We  say  M  is  simulated  by  a  machine  M'  on-line  in  time  f(t)  if  for  every  time  step  f, 
where  M  reads/writes  an  input/output  symbol,  there  is  a  corresponding  time  step  fj  where 
M'  reads/writes  the  same  symbol,  and  t\  <  /(f,).  If  M'  simulates  M  in  time  /(f),  but  the 
simulation  is  not  on-line,  then  we  say  that  M'  simulates  M  off-line.  Clearly,  if  M1  simulates  M 
on-line  in  time  f(t),  then  we  can  modify  M'  to  simulate  M  off-line  in  time  f(t).  The  converse 
may  not  be  true.  The  distinction  between  on-line  and  off-line  simulations  is  meaningless  if  the 
simulated  machine  reads  all  of  its  input  before  writing  any  output  (e.g.,  a  machine  that  only 
accepts  or  rejects  its  input). 
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We  say  M  is  simulated  by  M'  in  real  time  if  there  is  a  constant  c  such  that  the  following  holds: 
if  M  reads/writes  an  input/output  symbol  at  time  steps  to  <  t\  <  ■  ■  ■  <  t(,  then  AV  reads/writes 
the  same  symbol  at  time  steps  t'Q  <  t\  <  ■  ■  ■  <  t'(,  and  for  1  <  i  <  £,  t'{  -  <  c(t,  -  i — 1 ) -  If 

machine  M  simulates  M'  in  real-time  and  Al'  simulates  M  in  real-time,  then  M  and  M'  are 
real-time  equivalent. 
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Chapter  3 


Literature  Review 


This  chapter  surveys  research  into  the  relative  power  of  computational  models.  We  begin  with 
a  review  of  results  on  Turing  machines  and  RAMs. 

3.1  Turing  Machines 

Early  results  in  computational  complexity  involved  the  deterministic  Turing  machine,  ilopcroft 
and  Ullman  (1979)  presented  many  of  these  results,  such  as  linear  speedup,  space  compression, 
and  time  and  space  hierarchies. 

Hennie  and  Stearns  (1966)  showed  that  Turing  machines  running  in  time  t  accept  more 
languages  than  Turing  machines  running  in  time  o{t/\ogt).  This  was  an  improvement  of  an 
earlier  time  hierarchy  presented  by  Hartmanis  and  Jtearns  (1965b).  Cook  and  Reckhow  (1973) 
presented  a  tighter  hierarchy  for  log-cost  RAMs;  specifically,  RAMs  running  in  time  t  accept 
more  languages  than  RAMs  running  in  time  o(t).  Paul  (1979)  substantially  tightened  the  time 
hierarchy  for  k-tape  Turing  machines,  for  fixed  k  >  2;  he  showed  that  the  class  of  languages 
accepted  by  k-tape  machines  in  time  t  is  strictly  contained  in  the  class  of  languages  accepted 
by  k-tape  machines  in  time  o(flog*f).  Later,  Fiirer  (1984)  showed  that  the  time  hierarchy  is  as 
tight  for  multitape  Turing  machines  (with  a  fixed  number  of  tapes)  as  it  is  for  RAMs. 

The  definition  of  the  Turing  machine  model  has  been  modified  in  many  ways  to  investigate 
how  different  enhancements  of  the  model  affect  its  computational  power.  One  such  variation 
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already  mentioned  is  the  multitape  Turing  machine.  Hennie  and  Stearns  (1966)  showed  that  a 
Jfc-tape  machine  can  be  simulated  by  a  2-tape  machine  within  a  logarithmic  time  factor.  Paul 
et  al.  (1981)  proved  that  a  fc-tape  machine  cannot  be  simulated  in  real  time  by  a  Turing 
machine  with  fewer  than  k  tapes.  Their  proof  relied  on  information- theoretic  techniques  from 
Kolmogorov  complexity,  which  has  since  been  used  to  prove  lower  bound  results  for  other  on¬ 
line  simulations  (e.g.,  Loui,  1983),  including  some  results  in  this  thesis.  We  discuss  Kolmogorov 
complexity  in  more  detail  in  Subsection  5.2.2. 

Another  variant  of  the  Turing  machine  is  the  multidimensional  Turing  machine.  Hennie 
(1966)  and  Grigor’ev  (1977)  showed  that  for  all  e  >  d,  a  d-dimensional  Turing  machine  requires 
time  fi(t1+l1/dl-(1/el)  time  to  simulate  on-line  an  e-dimensional  Turing  machine  running  in 
time  t.  Pippenger  (1982)  showed  that  this  lower  bound  holds  even  if  the  simulating  machine 
is  probabilistic  (a  probabilistic  machine’s  computation  path  is  determined  by  a  series  of  “coin 
flips”).  Pippenger  and  Fischer  (1979)  achieved  this  lower  bound  for  d  =  1:  they  showed  that 
every  e-dimensional  Turing  machine  running  in  time  t  can  be  simulated  by  a  one-dimensional 
Turing  machine  in  time  Loui  (1982)  presented  a  near-optimal  upper  bound  for  d  >  2 

by  showing  that  every  e-dimensional  Turing  machine  of  time  complexity  t  can  be  simulated 
on-line  by  a  deterministic  d-dimensional  Turing  machine  in  time  0(t1+(1/d)~(1/e)(log  t)0^)). 
Pippenger  (1982)  presented  simulations  of  e-dimensional  Turing  machines  of  time  complexity  t 
by  probabilistic  d-dimensional  Turing  machines  in  time  C,(t1+(1/d)-(1/e)(log  t)l^d). 

3.2  Random  Access  Machines  (RAMs) 

Research  into  the  complexity  of  RAMs  has  included  investigation  of  how  different  definitions 
of  the  time  measure  and  different  instruction  sets  affect  the  time  complexity  of  the  RAM. 
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Hartmanis  (1971)  described  the  Random  Access  Stored-Program  Machine  (RASP),  a  RAM 
whose  program  is  stored  in  memory.  Storing  the  program  in  memory  allows  for  alterations  in 
the  program  during  execution.  Cook  and  Reckhow  (1973)  compared  the  time  complexity  of  the 
RAM  with  the  time  complexity  of  the  RASP.  They  showed  that  the  RAM  and  the  RASP  are 
real-time  equivalent. 

It  is  clear  that  every  log-cost  RAM  can  be  simulated  by  a  unit-cost  RAM  in  real  time. 
Paul  and  Reischuk  (1981)  showed  that  every  log-cost  RAM  running  in  time  t  can  be  simulated 
off-line  by  a  unit-cost  RAM  running  in  time  0(t/loglog<). 

Fischer  (1975)  showed  that  unit-cost  SRAMs  can  simulate  log-cost  RAMs  with  addition  and 
subtraction  in  real  time.  Dymond  (1977)  proved  that  every  unit-cost  SRAM  running  in  time  t 
can  be  simulated  on-line  by  a  log-cost  RAM  in  time  0(t  logt).  A  faster  on-line  simulation  is  not 
known,  but  Schonhage  (1988)  exhibited  a  separation  between  the  log-cost  RAM  and  unit-cost 
SRAM  by  showing  that  a  log-cost  RAM  requires  Q(n  log*n)  time  just  to  read  and  store  (on-line) 
an  arbitrary  input  of  n  bits. 

With  two  such  different  models  of  computation  (the  Turing  machine  and  the  RAMy,  it  is 
natural  to  examine  how  they  differ  in  their  use  of  time  and  space.  Cook  and  Reckhow  (1973) 
investigated  relationships  between  Turing  machines  and  log-cost  RAMs.  They  showed  that 
every  Turing  machine  running  in  time  t  can  be  simulated  by  a  log-cost  RAM  running  in  time 
0(t  log  t).  They  also  showed  that  every  log-cost  RAM  running  in  time  t  can  be  simulated  in  time 
0(t 2)  by  a  Turing  machine.  For  unit-cost  RAMs  running  in  time  t,  they  gave  a  simulation  by  a 
Turing  machine  in  time  0(t3).  Katajainen  et  al.  (1988)  improved  the  first  result:  they  showed 
that  every  Turing  machine  of  time  complexity  t  and  space  complexity  s  can  be  simulated  by  a 
log-cost  RAM  running  in  time  O(tloglogs)  (hence  in  time  0(t  log  logt)).  Wiedermann  (1983) 
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improved  the  simulation  of  a  log-cost  RAM  by  a  Turing  machine  to  run  in  time  0(t2 /  log  t).  Loui 
(1983)  proved  that  every  log-cost  RAM  running  in  time  t  can  be  simulated  by  a  d-dimensional 
Turing  machine  running  in  time  0(tl+(1/d'>/ log  t). 

Hopcroft  et  al.  (1975)  presented  an  off-line  simulation  of  Turing  machines  running  in  time 
t  by  unit-cost  RAMs  running  in  time  0(t/\ogt).  The  RAM  precomputes  a  table  of  configu¬ 
rations  of  the  Turing  machine  and  simulates  the  Turing  machine  using  simple  table  look-ups. 
This  technique  is  also  used  in  the  famous  four  Russians  algorithm  for  matrix  multiplication 
(Arlazarov  et  al.,  1970).  Galil  (1976)  extended  the  result  of  Hopcroft  et  al.  (1975)  by  showing 
how  the  simulation  could  be  converted  into  an  on-line  simulation. 

3.3  Tree  Machines 

We  have  seen  how  additional  tapes  or  more  dimensions  enhance  the  complexity  of  the  Turing 
machine.  Researchers  have  also  investigated  the  effect  of  more  radical  variations  of  the  storage 
structure  on  the  computational  complexity  of  the  Turing  machine.  One  variation  is  the  tree 
machine,  whose  storage  tape  js  a  complete  infinite  rooted  binary  tree. 

Clearly,  every  one-dimensional  Turing  machine  can  be  simulated  by  a  tree  machine  in  real 
time.  Reischuk  (1982)  showed  that  every  e-dimensional  Turing  machine  of  time  complexity  t 
can  be  simulated  on-line  by  a  tree  machine  in  time  0(f5eIog*{).  It  is  not  known  whether  this 
simulation  is  optimal.  Pippenger  (1982)  showed  that  every  multidimensional  Turing  machine 
can  be  simulated  on-line  by  a  probabilistic  tree  machine  in  linear  time. 

Pippenger  and  Fischer  (1979)  showed  that  every  tree  machine  of  time  complexity  t  can  be 
simulated  on-line  by  a  one-dimensional  Turing  machine  in  time  0(t2  /  \ogt).  Extending  this 
result,  Loui  (1983)  showed  that  every  tree  machine  running  in  time  t  can  be  simulated  by  a 
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d-dimensional  Turing  machine  in  time  O^+^/^/logt)-  He  ^so  used  Kolmogorov  complexity 
to  show  that  this  simulation  is  optimal. 

Some  relationships  between  tree  machines  and  RAMs  have  also  been  established.  Paul  and 
Reischuk  (1981)  showed  that  every  log-cost  RAM  can  be  simulated  by  a  tree  machine  in  real¬ 
time.  They  also  showed  that  every  tree  machine  running  in  time  t  can  be  simulated  off-line  by 
a  unit-cost  RAM  in  time  0(t/loglogt). 

3.4  Pointer  Machines 

Schonhage  (1980)  introduced  the  Storage  Modification  Machine,  also  known  as  the  pointer 
machine.  He  showed  that  the  pointer  machine  is  real-time  equivalent  to  the  unit-cost  SRAM; 
therefore,  results  already  mentioned  concerning  the  time  complexity  of  unit-cost  SRAMs  apply 
as  well  to  pointer  machines,  and  vice-versa.  Besides  establishing  the  relationship  between 
pointer  machines  and  SRAMs,  Schonhage  also  showed  that  pointer  machines  can  simulate 
multidimensional  Turing  machines  in  real  time. 

Tables  3.1  and  3.2  contain  a  summary  of  the  relationships  between  the  various  models. 
These  tables  also  include  results  from  this  thesis  (with  accompanying  theorem  numbers).  In 
these  tables,  X  — ►  Y  means  that  machine  X  can  be  simulated  by  machine  Y  in  time  0(f). 

3.5  Space  Measures 

We  have  already  mentioned  the  existence  of  a  space  hierarchy  for  Turing  machines.  There  are, 
of  course,  other  results  concerning  the  use  of  space  on  the  various  computational  models.  For 
instance,  Slot  and  van  Emde  Boas  (1988)  studied  the  relationship  of  Turing  machine  space  to 
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Host 


Guest 

Turing  machine 

d-Turing  machine 

tree  machine 

Turing 

machine 

X\ 

real-time 

(straightforward) 

real-time 

(straightforward) 

e-Turing 

machine 

e^-O/e)) 

(Pippenger  & 
Fischer,  79) 

o(ti+(i/<i)-(i/e)(iOg0o(1)) 

(Loui,  82) 
n(*i+(i/<0-0/«)) 

(Hennie,  66)  and 
(Grigor’ev,  77) 

0(*5el°e'‘) 

(Reischuk,  82) 

tree 

machine 

Q(t2/\ogt) 

(Pippenger  & 
Fischer,  79) 

0(t1+(1/rf)/  log  t ) 

(Loui,  83) 

log-cost 

RAM 

0(t2  j logt) 

(Wiedermann,  83) 

O^+OAO/logt) 

(Loui,  83) 

real-time 

(Paul  &  Reischuk,  81) 

a, _  ) 

fl+(l /d) 

Of  1  ) 

V(logt(loglogt)2); 
(Theorem  6.4) 

'(log<(loglogt)1  +  (1/d)) 
(Theorem  6.4) 

unit-cost 

RAM 

0(*3) 

(Cook  &  Reckhow,  73) 

0(t2  log  t) 

(Theorem  6.5) 
n(ti+<i/<o/iogi) 

(Theorem  6.6) 

0(t2) 

(log  RAM  -!*  tree  mach; 
unit  RAM  log  RAM) 

unit-cost 

SRAM 

(pointer 

machine) 

0(t 2  logf) 

(Wagner  &  Wechsung,  86) 

0(tl  +  (l/<i)(logf)1/<i) 

(Theorem  6.7) 
n(t1+(,/d)/iogO 

(Theorem  6.8) 

Oit  log  t) 

(log  RAM  —  tree  mach; 
unit  SRAM  ‘'-2*  log  RAM) 

Table  3.1:  Simulation  time  bounds  (Turing  and  tree  machines) 
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Host 


Guest 

log-cost  RAM 

unit-cost  RAM 

unit-cost  SRAM 
(pointer  machine) 

Turing 

machine 

0(t  log  log  t) 

(Katajainen  et  ai,  88) 

0(t/\og  t) 

off-line: 

(Hopcroft  et  al. ,  75) 
on-line: 

(Galil,  76) 

real-time 

(straightforward) 

e-  Turing 
machine 

log  t(log  log  £)^e  ^ 

1  (log01/e  J 

(Theorem  6.1) 

0(i/(Iog<)1/e) 

off-line: 

(Grigor’ev,  79) 
on-line: 

(Theorem  6.2) 

real-time 

(Schonhage,  80) 

tree 

machine 

0((tlogt)/logl°gt) 

(Theorem  5.2) 

off-line: 

0{tf  log  log  t) 

(Paul  &  Reischuk,  81) 
on-line: 

real-time 
(Theorem  5.1) 

real-time 

(straightforward) 

log- cost 
RAM 

\| 

off-line: 

0(t/log  logt) 

(Paul  &  Reischuk,  81) 
on-line: 

real-time 

(straightforward) 

0(t) 

(Fischer,  75) 

unit-cost 

RAM 

o(f) 

(straightforward) 

0(f) 

(log  RAM  unit  SRAM; 
unit  RAM  log  RAM) 

unit-cost 

SRAM 

( pointer 
machine ) 

0(t  log  t) 

(straightforward) 

Q((t  log  t)/  log  log*) 

(Theorem  5.9) 

real-time 

(straightforward) 

Table  3.2:  Simulation  time  bounds  (RAMs) 
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RAM  space.  One  of  their  goals  was  to  establish  whether  a  Turing  machine  using  space  s  could 
simulate  a  RAM  using  space  0(s),  and  vice-versa. 

Slot  and  van  Emde  Boas  considered  two  logarithmic  measures  for  RAM  space:  size,,  which 
charges  only  the  length  of  the  contents  of  each  register  used,  and  size*,,  which  accounts  for  both 
the  contents  and  the  address  of  each  register.  They  showed  that  both  measures  allow  for  space 
equivalence  of  RAMs  and  Turing  machines;  however,  the  proof  of  space  equivalence  with  size , 
relied  on  a  simulation  of  a  RAM  by  a  Turing  machine  in  exponential  time.  On  the  other  hand, 
simulations  based  on  sizei,  used  linear  space  and  polynomial  time  in  both  directions.  Thus  the 
second  space  measure,  which  seems  more  intuitive,  also  seems  to  allow  for  a  closer  relationship 
between  the  two  models. 

One  important  issue  in  computational  complexity  is  the  relationship  of  time  and  space 
complexity  of  a  particular  model.  Hopcroft  et  al.  (1977)  showed  that  “space  is  strictly  more 
powerful  as  a  resource  for  deterministic  multitape  Turing  machines”:  they  proved  that  every 
deterministic  multitape  Turing  machine  running  in  time  t  can  be  simulated  by  a  Turing  machine 
using  space  t/logt.  Consequently,  by  the  space  hierarchy  for  Turing  machines,  the  class  of 
languages  accepted  by  a  Turing  machine  in  time  t  is  strictly  contained  in  the  class  of  languages 
accepted  in  space  t.  Adleman  and  Loui  (1981)  presented  an  alternative  proof  of  the  result  of 
Hopcroft  et  al. 

Paul  and  Reischuk  (1981)  showed  that  every  d-dimensional  Turing  machine  running  in  time 
t  can  be  simulated  by  another  d-dimensional  Turing  machine  in  space  tbd^0i'1  /  log  t.  Pippenger 
(1982)  improved  the  time-space  result  for  multidimensional  Turing  machines,  showing  that 
there  is  a  simulation  that  runs  in  space  f /log  f . 
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Model  Time-Space  Relationship 


Turing  machine 

t  —*  t/ log  t 

(Hopcroft  et  at.,  75) 

d-Turing  machine 

t  — ►  t/ log  t 

(Pippenger,  82) 

log-cost  RAM 

t  — *■  tf log  t 

(Paul  and  Reischuk,  81) 

pointer  machine  (mass) 

t  — *  tj log  t 

(Halpern  et  al.,  86) 

pointer  machine  (capacity) 

t  —  t 

(Proposition  2.5) 

tree  machine 

t  — ►  </ log  t 

(Paul  and  Reischuk,  81) 

Table  3.3:  Time  vs.  space  on  computational  models 


Paul  and  Reischuk  (1981)  also  showed  that  every  log-cost  RAM  running  in  time  t  can  be 
simulated  by  a  Turing  machine  in  space  O(t/\ogt).  By  space  equivalence  of  RAMs  and  Turing 
machines  (Slot  and  van  Emde  Boas,  1988),  we  have  the  same  time-space  result  for  RAMs  as 
for  Turing  machines.  Another  result  of  Paul  and  Reischuk  is  that  every  tree  machine  of  time 
complexity  t  can  be  simulated  by  a  Turing  machine  in  space  0(tj log  t).  By  space  equivalence  of 
tree  machines  and  Turing  machines,  we  again  duplicate  the  Turing  machine  time-space  result, 
this  time  for  tree  machines. 

Halpern  et  al.  (1986)  investigated  the  relationship  of  time  and  space  complexity  in  pointer 
machines.  They  showed  that  every  pointer  machine  running  in  time  t  can  be  simulated  in 
mass  O(t/logt).  Table  3.3  summarizes  these  time  vs.  space  results  (/  — < ►  g  means  that  a 
machine  running  in  time  /  can  be  simulated  by  a  machine  running  in  space  0(g)). 

Van  Emde  Boas  (1989)  showed  that  Turing  machines  running  in  space  s  accept  exactly 
the  same  languages  as  a  pointer  machines  using  O(s/logs)  nodes;  however,  his  simulation  of  a 
Turing  machine  by  a  pointer  machine  uses  quadratic  time.  This  thesis  contains  an  improvement 
to  this  result  -  our  simulation  runs  in  linear  time  with  the  same  space  complexity  as  the 
simulation  of  van  Emde  Boas. 


3.6  Nondeterminism  and  Alternation 


Paul  et  al.  (1983)  proved  that  nondeterministic  two-tape  Turing  machines  running  in  linear 
time  accept  more  languages  than  A:-tape  deterministic  Turing  machines  running  in  linear  time 
for  any  constant  k.  The  best  known  simulation  of  a  nondeterministic  Turing  machine  of  time 
complexity  t  by  a  deterministic  Turing  machine  takes  time  0(1)*  (Hopcroft  and  Ullman,  1979; 
Yap,  1987).  Unfortunately,  none  of  this  work  has  yet  led  to  an  answer  to  the  notorious  P  = 
NP  question. 

Savitch  (1970)  used  a  recursive  search  of  machine  configurations  to  prove  that  every  nonde¬ 
terministic  Turing  machine  with  space  complexity  s  can  be  simulated  by  a  deterministic  Turing 
machine  using  space  s 2.  By  Savitch ’s  result  we  know  that  the  class  of  languages  accepted  in 
polynomial  space  on  a  deterministic  Turing  machine  is  equivalent  to  the  class  of  languages 
accepted  in  polynomial  space  on  a  nondeterministic  Turing  machine  (PSPACE  =  NPSPACE). 

Chandra  et  al.  (1981)  established  the  fundamental  properties  of  alternating  Turing  ma¬ 
chines.  They  showed,  for  instance,  that  every  nondeterministic  Turing  machine  of  space  com¬ 
plexity  s  can  be  simulated  by  an  alternating  Turing  machine  running  in  time  0(s2).  Conversely, 
every  alternating  Turing  machine  of  time  complexity  t  can  be  simulated  by  a  deterministic  Tur¬ 
ing  machine  using  space  t.  An  important  corollary  to  these  two  results  is  that  the  class  of 
languages  accepted  in  deterministic  polynomial  space  (PSPACE)  is  equivalent  to  the  class  of 
languages  accepted  in  alternating  polynomial  time  (APTIME). 

For  alternating  Turing  machines,  Paul  et  al.  (1980a)  showed  that  decreasing  the  number 
of  tapes  does  not  increase  the  time  complexity;  i.e.,  every  alternating  Turing  machine  run¬ 
ning  in  time  t  can  be  simulated  by  a  one-tape  alternating  Turing  machine  with  no  time  loss. 
Furthermore,  they  showed  that  every  nondeterministic  Turing  machine  of  time  complexity  t 
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can  be  simulated  by  an  alternating  Turing  machine  running  in  time  t1/2.  Paul  and  Reischuk 
(1980b)  proved  that  every  deterministic  Turing  machine  of  time  complexity  t  can  be  simulated 
by  an  alternating  Turing  machine  running  in  time  (t  log  log  t)/log  t.  Dymond  and  Tompa  ( 1985) 
improved  this  last  result  with  a  simulating  alternating  Turing  machine  running  in  time  t/logt. 
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Chapter  4 


Pointer  Machines 


In  this  chapter,  we  present  several  results  concerning  both  time  and  space  complexity  of  pointer 
machines.  These  results  indicate  that,  in  many  ways,  pointer  machines  are  similar  to  Turing 
machines  and  RAMs. 

4.1  Notation 

We  present  a  standard  notation  used  in  this  chapter  to  describe  the  complexity  classes  defined 
by  pointer  machines  and  Turing  machines. 

Let  M  be  TM  for  Turing  machines  and  PM  for  pointer  machines.  Ad-TIME(/)  is  the  class 
of  languages  accepted  by  the  machine  type  specified  by  M  in  time  /,  where  /  is  a  function  of 
the  input  length.  TM-SPACE(/)  is  the  class  of  all  languages  accepted  by  Turing  machines  in 
space  /.  PM-CAPACITY(/)  and  PM-MASS(/)  arc  the  space  complexity  clashes  for  pointer 
machines.  To  specify  complexity  classes  for  nondeterministic  versions  of  these  machines,  we 
prefix  the  machine  type  with  N;  for  alternating  machines,  A. 

4.2  Space  Compression 

Our  first  result  is  a  space  compression  theorem  for  pointer  machines. 

Theorem  4.1  For  every  constant  c  >  0,  every  pointer  machine  with  mass  complexity  s(n)  can 
be  simulated  by  some  pointer  machine  with  mass  complexity  cs(n). 
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Proof.  We  show  the  case  where  c  =  1/2.  Consider  pointer  machine  A  having  mass  complex¬ 


ity  s(n).  We  design  a  pointer  machine  D  that  simulates  A  and  has  mass  complexity  s(n)/2. 

For  B  to  compute  with  half  the  number  of  nodes  of  A,  we  encode  two  nodes  of  /I  into  one 
node  of  B  with  the  addition  of  several  pointers.  For  every  £  in  the  pointer  alphabet  of  A,  the 
pointer  alphabet  of  B  has  £(1,1),  £(1,2),  £(2,1),  £(2,2).  Each  node  in  B  corresponds  to  a 
pair  of  nodes  in  A.  The  ordered  pairs  in  the  pointer  notation  indicate  the  original  source  and 
destination  nodes. 

We  also  create  one  node  (called  G )  to  hold  “useless”  pointers.  And  we  need  a  pointer  7 
that  points  to  G  from  any  node,  so  G  is  always  accessible.  The  7  pointer  from  G  always  points 
to  the  last  node  created. 

We  establish  the  correspondence  between  nodes  of  A  and  nodes  of  B  as  follows.  Call  a  node 
in  B  a  node-pair  to  distinguish  it  from  the  pair  of  nodes  in  A  to  which  it  corresponds.  Designate 
the  older  node  in  a  pair  of  nodes  in  A  as  node  1  and  the  other  as  node  2.  If  the  £  pointer  of 
node  1  in  a  pair  corresponding  to  node-pair  a  in  B  points  to  node  1  in  a  pair  corresponding 
to  node-pair  6,  then  Ps(i,i)(<i)  =  b  and  Ps(i,2)(o)  =  G.  Other  cases  are  handled  similarly  (see 
Figure  4.1). 

Since  we  are  working  with  node-pairs  in  B  ,  we  need  to  designate  where  the  center  is  within 
a  pair.  The  structure  of  B  tells  us  whether  a  node-pair  contains  the  center  of  the  structure  of 
A  by  identifying  that  node-pair  as  the  center.  Then  we  can  use  a  pointer  <t>  and  two  additional 
nodes  H\  and  II2  in  B  to  tell  us  whether  the  center  is  node  1  or  node  2  by  having  <j>  point  to 
H\  or  H2,  as  appropriate. 

We  initialize  B  by  creating  nodes  G ,  // 1,  and  Ih-  After  we  set  their  pointers  appropriately, 
we  are  ready  to  simulate  A. 
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initial  node 


ORIGINAL 


(current  center) 


*(1,2) 


SIMULATION 


(nodes  numbered  in  sequence  they  were  created) 
Figure  4.1:  Reducing  the  number  of  nodes  by  1/2 
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Rather  than  describe  the  simulation  of  A  in  tedious  detail,  we  discuss  how  to  simulate  one 
instruction,  create  6.  The  other  instructions  are  simulated  analogously. 

To  simulate  create  6,  we  first  find  the  last  node-pair  created  by  following  the  7  pointer  of 
G .  Call  this  node-pair  a.  We  then  determine  whether  node-pair  a  corresponds  to  a  single  node 
in  A  (if  an  odd  number  of  nodes  have  been  created  in  the  execution  of  A  at  this  point)  or  to 
an  actual  pair.  If  P£(2,i)(<*)  =  G  and  Ps( 2,2)(a)  =  G\  then  the  node  to  be  created  in  A  is  the 
second  in  the  node-pair  a.  In  this  case,  we  assign  the  appropriate  pointers  from  the  current 
center  to  a,  and  we  also  assign  the  appropriate  pointer  (either  6(2, 1)  or  6(2,2))  to  the  current 
center  from  a. 

In  the  case  where  at  least  one  of  6(2, 1)  or  6(2,2)  does  not  point  to  G ,  we  must  create  a  new 
node-pair  in  B.  Then  we  make  the  appropriate  pointer  assignments  from  this  new  node-pair  to 
the  current  center  and  to  G. 

With  the  addition  of  a  few  extra  pointers,  we  can  eliminate  H  1,  H2,  and  G.  We  simply 
encode  the  information  these  nodes  provide  with  extra  pointers  from  the  initial  node.  For 
example,  we  could  substitute  pointers  <p\  and  for  <f>.  One  of  these  two  would  point  to  the 
current  center  node-pair  from  the  initial  node  to  indicate  whether  the  center  is  node  1  or  node 
2. 

If  A  creates  s(n)  nodes,  then  B  creates  j\s(n)/2]  nodes  (if  we  eliminate  H j,  H 2,  and  G). 
We  can  then  generalize  the  procedure  (or  continue  to  apply  it  repetitively)  to  achieve  space 
complexity  cs{n)  for  any  c  <  1.  □ 
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Note  that  this  simulation  does  not  establish  space  compression  for  capacity  complexity:  if 
the  pointer  alphabet  size  of  the  original  machine  is  d ,  then  the  alphabet  size  of  the  simulator 
is  d(l/c)2. 

Although  space  compression  is  possible  using  mass  as  the  space  measure,  it  is  unclear 
whether  pointer  machines  also  enjoy  the  linear  speedup  property  of  Turing  machines. 

4.3  Space  Requirements  and  the  Invariance  Thesis 

Slot  and  van  Emde  Boas  (1988)  proposed  the  following  Invariance  Thesis: 

“There  exists  a  standard  class  of  machine  models,  which  includes  among  others  all 
variants  of  Turing  machines,  all  variants  of  RAMs  and  RASPs  with  logarithmic 
time  and  space  measures,  and  also  the  RAMs  and  RASPs  in  the  uniform  time 
and  logarithmic  space  measure,  provided  only  standard  arithmetical  instructions 
of  additive  type  are  used.  Machine  models  in  this  class  simulate  each  other  with 
polynomially  bounded  overhead  in  time  and  constant  factor  overhead  in  space.” 

They  showed  that  using  the  proper  space  measure  for  RAMs,  the  thesis  is  true  for  RAMs 
and  Turing  machines  in  its  strictest  interpretation;  i.e.,  where  the  time  and  space  bounds  on 
simulations  are  met  simultaneously.  The  RAM  space  measure  they  proposed  charges,  for  each 
register  used,  the  logarithm  of  the  register  address  plus  the  logarithm  of  the  largest  value  stored 
in  the  register  during  the  computation. 

An  obvious  question  is  whether  this  thesis  applies  as  well  to  pointer  machines.  Schonhage 
(1980)  presented  a  real-time  equivalence  between  unit-cost  SRAMs,  which  meet  the  qualifica¬ 
tions  of  the  thesis  with  respect  to  time,  and  pointer  machines.  So  the  thesis  holds  for  pointer 
machines  with  respect  to  time  complexity.  As  van  Emde  Boas  (1989)  has  noted,  equivalence 
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with  respect  to  space  complexity  depends  on  the  definition  of  the  space  measure.  The  following 
results  make  this  clearer. 

Theorem  4.2  If  s(n) /log  s(n)  nodes  can  be  created  by  a  pointer  machine  in  time  O(t),  then 
every  multitape  Turing  machine  of  space  complexity  s(n)  and  time  complexity  t(n )  can  be  sim¬ 
ulated  by  a  pointer  machine  of  mass  complexity  O(s(n)/\ogs(n))  and  time  complexity  0(t(n)). 
This  applies  whether  both  machines  are  deterministic,  nondeterministic,  or  alternating. 

Proof.  We  demonstrate  how  the  simulation  works  for  deterministic  machines  and  then  show 
how  the  result  extends  to  nondeterministic  and  alternating  machines. 

The  following  simulation  holds  for  Turing  machines  with  multiple  worktapes;  however,  for 
simplicity,  we  explain  how  to  simulate  a  Turing  machine  with  a  single  one-way  infinite  worktape 
and  a  read-only  input  tape.  Without  loss  of  generality,  the  worktape  alphabet  of  the  Turing 
machine  is  {0, 1}. 

We  design  a  one-to-one  correspondence  between  the  storage  configurations  of  a  Turing 
machine  M  using  space  s  and  the  configurations  of  a  pointer  machine  M'  with  0(s/ log  s )  nodes: 
we  partition  the  worktape  of  M  into  blocks  of  size  b  =  log(s/logs).  With  this  partitioning, 
there  are  s/b  =  O(s/logs)  blocks.  M'  represents  the  tape  contents  with  three  node  structures: 
the  tree,  the  blockset,  and  the  cache  (see  Figure  4.2). 

The  tree  is  a  complete  binary  tree  of  height  b.  The  blockset  consists  of  O(s/logs)  nodes, 
00,01, . . . ,  each  node  representing  one  block.  The  contents  of  a  particular  block  are  represented 
by  a  pointer  to  a  leaf  of  the  tree.  Since  the  tree  has  2b  =  s/logs  leaves,  there  is  a  one-to-one 
correspondence  between  the  leaves  of  the  tree  and  the  contents  of  a  block.  This  one-to-one 
correspondence  can  be  observed  by  noting  that  each  leaf  of  the  tree  can  be  reached  from  the 
root  by  following  a  unique  sequence  of  left  and  right  branches.  By  assigning  “0”  to  each  left 


32 


Figure  4.2:  Representing  a  Turing  machine  with  pointer  machine  nodes 
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branch  and  “1”  to  each  right  branch,  we  assign  a  6-bit  number  to  the  leaf.  This  6-bit  number 
corresponds  to  a  unique  configuration  of  the  contents  of  a  block  of  6  cells. 

Now  consider  two  adjacent  blocks  B,  and  B,+i  of  M,  such  that  the  worktape  head  is 
currently  in  either  B,  or  B,+ j.  B,  and  B,+i  are  represented  in  the  blockset  by  0,  and  Bi+it 
respectively.  During  the  simulation,  M'  keeps  the  contents  of  /?,  and  Bi+i  in  the  cache. 

The  cache  consists  of  a  chain  of  21og(s/logs)  nodes  and  two  additional  nodes,  “0”  and  “1.” 
Each  node  in  the  chain  has  a  pointer  to  either  the  “0”  or  the  “1”  node,  so  that  the  entire  chain 
is  a  direct  representation  of  two  adjacent  blocks  of  M's  worktape. 

M1  decodes  a  node  Bi  into  the  cache  as  follows:  M‘  finds  the  tree  leaf  pointed  to  by  Bi •  A/' 
then  traces  the  path  of  the  tree  to  the  root,  noting  at  each  tree  node  whether  it  was  a  right  or 
a  left  child.  For  each  tree  node  in  the  path,  M 1  sets  a  pointer  of  a  node  in  the  cache  to  the  “0” 
or  the  “1”  node,  depending  on  whether  the  tree  node  was  a  right  or  left  child. 

M'  encodes  the  contents  of  half  the  cache  back  to  the  blockset  by  following  the  above  steps 
in  reverse. 

The  simulation  proceeds  as  follows:  M1  initially  builds  the  blockset,  tree,  and  cache.  We 
assume  that  blocks  are  numbered  from  left  to  right,  that  all  cells  of  the  worktape  of  M  contain 
0,  and  that  M  starts  with  its  worktape  head  on  the  leftmost  tape  cell.  So  M1  initially  decodes 
Bo  and  Bi  into  the  cache.  M'  then  begins  the  actual  simulation  of  M . 

Assume  B»  and  Bi+i  are  decoded  in  the  cache.  As  long  as  the  tape  head  remains  in  B, 
and  B,+i,  M'  performs  a  straightforward  simulation  of  A/,  using  the  cache.  Finite  control  and 
input  processing  of  M  are  simulated  in  a  straightforward  manner  using  finite  control  and  input 
processing  of  M' .  When  the  tape  head  moves  to  the  right  of  B,+i,  A/'  encodes  Bi  back  to  Bi 
and  decodes  Bi+2  into  the  cache,  shifting  to  the  left  the  “contents”  of  B,+i  currently  in  the 


cache  (by  using  a  few  extra  pointers  to  mark  the  ends  and  the  middle  of  the  cache,  Al'  can 
“shift”  the  cache  to  the  left  by  switching  pointers  so  that  the  right  6  nodes  are  now  the  left  b 
nodes).  If  the  tape  head  moves  to  the  left  of  J3t,  then  a  similar  operation  occurs.  At  this  time, 
since  the  worktape  head  of  M  is  in  the  middle  of  two  blocks,  M'  can  simulate  at  least  b  steps 
of  M  before  performing  the  encoding  and  decoding  operations  again. 

During  the  simulation,  M'  creates  O(s/logs)  nodes:  0(s/ log  a)  nodes  for  the  tree  and  block- 
set  and  O(log(s/logs))  nodes  for  the  cache.  Therefore  M'  simulates  M  with  mass  O(s/logs). 

The  straightforward  simulation  in  the  cache  requires  a  total  of  0(t )  time.  The  only  other 
time  requirement  is  for  encoding  and  decoding  the  blocks  and  shifting.  Shifting  takes  only  a 
constant  amount  of  time.  Encoding  and  decoding  can  be  done  in  0(b)  time,  since  that  is  the 
height  of  the  tree.  Because  M'  maintains  two  decoded  blocks,  it  performs  an  encoding  and 
decoding  only  every  0(b)  steps,  so  the  total  time  required  for  the  simulation  is  0(t). 

To  prove  that  this  works  for  nondeterministic  and  alternating  machines,  we  note  that  the 
construction  implies  an  instruction-by-instruction  simulation;  thus  each  computation  path  can 
be  treated  as  a  separate  simulation.  So  each  computation  path  of  the  Turing  machine  is 
simulated  in  an  efficient  manner  by  the  corresponding  computation  path  of  the  pointer  machine. 

□ 

Dymond  and  Cook  (1980)  use  a  structure  similar  to  the  tree  and  blockset  in  their  analysis  of 
the  relationship  between  deterministic  Turing  machines  and  hardware  modification  machines  (a 
hardware  modification  machine  is  a  collection  of  variably  connected,  synchronously  operating 
finite  state  transducers). 

Corollary  4.3  If  s(n)/\ogs(n)  nodes  can  be  created  by  a  pointer  machine  in  time  0(t),  then 
every  multitape  Turing  machine  of  space  complexity  s(n)  and  time  complexity  t(n)  can  be  sim- 
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ulated  by  a  pointer  machine  of  capacity  complexity  0(s(n ))  and  time  complexity  0(t(n)).  This 
applies  whether  both  machines  are  deterministic,  nondeterministic,  or  alternating. 

Proof.  By  Proposition  2.5,  a  pointer  machine  of  mass  complexity  s  has  capacity  complexity 
0(s log  s).  By  Theorem  4.2,  we  know  that  a  Turing  machine  using  space  s  can  be  simulated  by 
a  pointer  machine  of  mass  complexity  s/logs.  This  pointer  machine  has  capacity  complexity 
0 ((s/log  s)  log(s/log  s))  =  0((s/logs)  logs)  =  0(s).  □ 

Theorem  4.2  sharpens  the  following  result  of  van  Emde  Boas  (1989),  whose  pointer  machine 
simulator  used  mass  O(s/logs),  but  time  0(t2)  (although  his  simulation  did  not  require  s/logs 
to  be  constructible): 

Theorem  4.4  (van  Emde  Boas,  1989)  Every  Turing  machine  of  space  complexity  s(n)  can 
be  simulated  by  a  pointer  machine  of  capacity  complexity  0(s(n))  (hence,  of  mass  complexity 
0(s{n) /log s(n)))  in  polynomial  time.  This  applies  whether  both  machines  are  deterministic  or 
nondeterministic. 

Van  Emde  Boas  also  gave  a  result  for  simulation  in  the  other  direction: 

Theorem  4.5  (van  Emde  Boas,  1989)  Every  pointer  machine  of  capacity  complexity  s(n) 
(hence,  of  mass  complexity  s(n)/logs(n))  can  be  simulated  by  a  Turing  machine  in  space  s(n) 
in  polynomial  time.  This  applies  whether  both  machines  are  deterministic  or  nondeterministic. 

Since  Slot  and  van  Emde  Boas  showed  that  the  Invariance  Thesis  holds  in  its  strictest 
interpretation  (i.e.,  where  time  and  space  bounds  are  met  simultaneously)  for  Turing  machines 
and  RAMs  using  the  size t  space  measure,  we  can  establish  a  relationship  between  RAMs  and 
pointer  machines: 
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Corollary  4.6  Every  pointer  machine  can  be  simulated  by  a  RAM  in  polynomial  time  and  with 
constant  factor  overhead  in  space  (size;,  space  measure  for  RAMs). 

Corollary  4.7  Every  RAM  can  be  simulated  by  a  pointer  machine  in  polynomial  time  and  with 
constant  factor  overhead  in  capacity  (sizej  space  measure  for  RAMs). 

If  we  consider  capacity  as  the  true  measure  of  space  in  pointer  machines,  then  we  must 
reevaluate  the  result  of  Halpern  et  al.  (1986):  that  every  pointer  machine  of  time  complexity  t 
can  be  simulated  by  a  pointer  machine  of  space  complexity  O(t/\ogt).  The  authors  considered 
mass  as  the  space  measure.  A  different  approach  is  necessary  to  achieve  the  same  result  for 
capacity,  if  it  is  even  possible. 

Using  Theorem  4.4,  we  obtain  a  time-space  result  in  the  other  direction. 

Theorem  4.8  Every  pointer  machine  of  capacity  complexity  s(n)  can  be  simulated  by  a  pointer 
machine  of  time  complexity  0(n0(l)^")).  This  applies  whether  the  machines  are  deterministic 
or  nondeterministic. 

Proof.  Let  X  be  D  or  N. 

XPM-CAPACITY(s)  =  XTM-SPACE(s)  (Theorem  4.4) 

C  XTM-TIME(nO(l)*)  (see  Yap,  1987) 

C  XPM-TIME(nO(l)J)  (Schonhage,  1980) 

□ 

Combining  Theorem  4.8  and  Proposition  2.5,  we  have: 

Corollary  4.9  Every  pointer  machine  of  mass  complexity  s(n)  can  be  simulated  by  a  pointer 
machine  of  time  complexity  0(ns(n)°(J(n^).  This  applies  whether  the  machines  are  determin¬ 
istic  or  nondeterministic. 
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Proof.  Let  X  be  D  or  N. 


XPM-MASS(s)  =  XPM-CAPACITY(0(.s  Jog 5))  (Proposition  2.5) 

C  XPM-TIME(0(n0(l)°(jl°84)))  (Theorem  4.8) 

=  XPM-TIME(0(ns°(J))) 

□ 

4.4  Space  and  Time  Hierarchies 

We  obtain  pointer  machine  space  and  time  hierarchies  that  are  analogous  to  hierarchies  for 
RAMs  and  Turing  machines.  The  space  hierarchies  for  pointer  machines  follow  from  the  space 
hierarchy  for  Turing  machines  (Hartmanis  et  al.,  1965a;  Sipser,  1980)  and  Theorems  4.4  and 
4.5. 

Corollary  4.10  If  s2(n)  is  capacity-constructible,  then  there  is  a  language  L  C  {0,1}*  such 
that  some  pointer  machine  recognizes  L  within  capacity  0(s2(n)),  but  for  any  function  si(n)  = 
o(s2(n)),  no  pointer  machine  recognizes  L  within  capacity  0(si(n)). 

Proof. 

XPM-CAPACITY(5l)  C  XTM-SPACE(sx)  (Theorem  4.5) 

C  XTM-SPACE(s2)  (Hartmanis  et  al.,  1965a) 

=  XPM-CAPACITY(0(s2))  (Theorem  4.4) 

□ 

Corollary  4.11  If  s2{n)  is  mass-constructible,  then  there  is  a  language  L  C  {0,1}*  such  that 
some  pointer  machine  recognizes  L  within  mass  0(s2(n)),  but  for  any  function  Si(n)  =  0(32(71)), 
no  pointer  machine  recognizes  L  within  mass  0(si(n)). 


38 


Proof. 


XPM-MASS(-si)  =  XTM-SPACE(si  log  s\ )  (Theorem  4.5) 

C  XTM-SPACE(52  log  52)  (Hartmanis  et  al.,  1965a) 

=  XPM-MASS(C(s2))  (Theorem  4.4) 

□ 

Corollary  4.12  Ift(n)/\ogt(n)  is  mass-constructible,  then  PM-TIME(t(n))  is  strictly  included 
in  PM-MASS(t(n )). 

Proof. 

PM-TIME(t)  C  PM-MASS(<3(t/log  t))  (Halpern  et  al.  (1986)) 

C  PM-MASS(t/logt)  (Theorem  4.1) 

C  PM-MASS(t)  (Corollary  4.11). 

□ 

To  prove  the  time  hierarchy  for  pointer  machines,  we  need  the  following  useful  result: 

Lemma  4.13  For  every  pointer  machine  M  there  is  a  pointer  machine  M'  with  |A'J  =  2  that 
simulates  M  in  real  time  and  constant  factor  overhead  in  space. 

Proof.  Let  A  be  the  pointer  alphabet  of  M,  with  d  =  |A|,  and  let  A'  =  {«,/?}  be  the 
pointer  alphabet  of  M' .  For  each  node  x  in  M  and  each  Si  in  A,  let  £,•  point  from  x  to  j/,-.  For 
each  node  x  in  M  there  is  a  chain  of  d  +  1  nodes  in  M',  connected  by  a  pointers,  with  one 
distinguished  node  x'  corresponding  to  x. 

Consider  one  such  chain  in  M' .  From  each  node  in  the  chain,  except  x use  the  /?  pointer 
to  point  to  the  appropriate  y\  corresponding  to  y;  (see  Figure  4.3).  □ 
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6\  =  a(3  62  =  aaf3  S3  —  aaafi  64  =  aaaa/3 

ORIGINAL  SIMULATION 

Figure  4.3:  Simulating  a  pointer  machine  with  |A'|  =  2. 

Theorem  4.14  //t2(7i)  is  time-constructible  by  a  pointer  machine,  then  there  is  a  language 
L  C  {0,1}*  such  that  some  pointer  machine  recognizes  L  within  time  0(t2(n)),  but  for  any 
function  *i(n)  =  o(t2(n)),  no  pointer  machine  recognizes  L  within  time  0(ti(n)). 

Proof.  Using  techniques  Cook  and  Reckhow  (1973)  applied  to  exhibit  a  time  hierarchy  for 
RAMs,  we  construct  a  universal  pointer  machine  that  can  diagonalize  over  time  <i(n)  compu¬ 
tations  in  time  0(t2(«))  (see  also  Hartmanis  and  Hopcroft,  1971). 

It  is  straightforward  to  show  how  a  pointer  machine  program  can  be  encoded  with  alphabet 
{0,  1):  we  first  encode  each  pointer  machine  instruction  with  {0, 1).  To  properly  encode  instruc¬ 
tion  labels  referenced  in  if  instructions,  we  may  assume  that  every  instruction  is  labeled  and 
that  the  instructions  are  labeled  sequentially  with  the  first  instruction  labeled  1.  We  then  use  a 
unary  encoding  for  the  instruction  label  referenced  in  an  if  instruction.  To  encode  the  pointer 
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machine  program,  we  concatenate  the  encodings  of  the  instructions  comprising  the  program. 
Let  Mw  be  the  pointer  machine  whose  encoding  is  w. 

By  Lemma  4.13  we  may  assume  without  loss  of  generality  that  for  the  machine  to  be 
simulated,  |A|  =  2. 

For  every  w ,  let  w ,  be  the  encoding  in  w  of  the  instruction  of  Mw  labeled  i.  Let  k(Mw)  — 
max{|i/;t|}.  Thus  k(Mw )  is  a  constant  that  depends  only  on  Mw.  Define  diagonalization  lan¬ 
guage  L  as  follows:  if  Mw  with  input  w  halts  in  time  t2(\w\)/k(Mw),  then  w  js  in  L  if  and  only 
if  M.z.  d^"s  not  accept  u*.  If  Mw  does  not  halt  in  time  t2(\w\)/ k(Mw),  then  w  is  not  in  L. 

Now  suppose  some  pointer  machine  M  recognizes  L  in  time  cti(n),  where  c  is  a  constant  de¬ 
pending  on  L.  Since  ti  =  o(t2),  there  is  a  w  sufficiently  long  such  that  ctj(|tu|)  <  t2(M)A(-^u>) 
and  Mw  accepts  exactly  the  same  language  as  M:  add  a  sufficient  number  of  accept  instruc¬ 
tions  immediately  after  an  accept  instruction  in  the  program  of  M  and  call  the  encoding  of 
this  new  pointer  machine  w.  By  our  definition  of  L,  w  €  L  (i.e.,  w  is  accepted  by  Mw)  if  and 
only  if  w  is  not  accepted  by  Mw;  so  we  have  a  contradiction.  Thus  no  pointer  machine  accepts 
L  in  time  0(ti(n)). 

We  now  construct  a  universal  pointer  machine  M  that  recognizes  L  in  time  0(t2(n)).  The 
key  issues  involved  in  the  construction  of  such  a  machine  are  that  it  (1)  constructs  an  appropriate 
simulator  of  Mw  in  time  0(|u;|),  (2)  simulates  Mw  in  linear  time,  and  (3)  keeps  track  of  the 
elapsed  time  of  the  simulated  machine  so  that  it  can  stop  after  t2(n)/k(Mw)  simulated  steps. 

M  goes  through  three  stages:  initialization,  preprocessing,  and  simulation. 

1.  Initialization: 

M  first  creates  eight  type  nodes  with  an  appropriate  type  pointer  corresponding  to  each. 
Each  type  node  corresponds  to  a  distinct  pointer  machine  instruction  type. 
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M  creates  two  alphabet  nodes  (since  |A|  =  2)  with  an  appropriate  alphabet  pointer  cor¬ 
responding  to  each.  M  uses  the  alphabet  nodes  to  designate  appropriate  arguments  for  the 
pointer  machine  instructions,  as  described  below. 

Initialization  takes  a  constant  amount  of  time. 

2.  Preprocessing: 

M  reads  w  and  decodes  it,  creating  a  node  for  each  instruction.  Call  the  set  of  instruction 
nodes  the  instruction  list.  Each  instruction  node  has  the  following  additional  pointers: 

a.  a  successor  pointer  to  the  next  instruction  node. 

b.  an  instruction  pointer  to  the  appropriate  type  node. 

c.  an  argument  pointer  to  an  argument  structure  described  below. 

d.  a  jump  pointer,  for  an  if  instruction,  to  point  to  the  instruction  node  to  which  control 
could  be  passed. 

e.  a  beginner  pointer  to  the  initial  node  (used  to  set  the  jump  pointer  appropriately). 

For  each  instruction  node,  M  uses  the  type  pointer  to  determine  the  instruction  type  of  the 
instruction  node.  M  uses  alphabet  pointers  in  a  similar  manner. 

The  argument  structure  is  necessary  for  the  create,  center,  assign,  and  if  instructions. 
These  instructions  have  arguments  that  are  words  in  A*.  For  any  instruction  with  argument 
W  G  A*  such  that  |W|  =  j,  the  argument  structure  contains  a  chain  of  j  nodes,  each  node 
having  a  pointer  to  one  of  the  alphabet  nodes  so  that  M  can  access  the  chain  to  determine  the 
argument  W . 
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To  set  the  jump  pointers,  M  makes  a  second  pass  through  the  instruction  nodes  and  the 
input  word  w.  For  each  node  representing  an  if  instruction,  M  returns  to  the  initial  node  with 
the  beginner  pointer  (recording  the  current  instruction  node  with  a  pointer).  If  ,v.'  1:  astruction 
specifies  a  conditional  jump  to  instruction  m,  M  uses  the  unary  encoding  of  m  in  w  to  move 
the  jump  pointer  to  the  mth  instruction  node.  In  this  way,  setting  the  pointers  takes  time 

0(H). 

During  preprocessing,  M  also  creates  a  linked  list  of  t2(M)  nodes  called  the  counter  to  keep 
track  of  elapsed  time  of  the  simulation.  This  is  possible  since  t2  is  time-constructible.  There  is 
one  special  pointer  to  the  last  node  in  this  list.  M  also  creates  a  linked  list  of  k(Mw)  nodes  to 
serve  as  the  auxiliary  counter ;  the  use  of  the  auxiliary  counter  is  explained  below. 

Three  additional  pointers  from  the  initial  node  keep  track  of  the  simulation.  The  execute 
pointer  points  from  the  initial  node  to  the  node  representing  the  instruction  being  simulated. 
The  counter  pointer  points  to  a  node  in  the  counter  to  indicate  how  much  time  has  elapsed. 
The  auxiliary  pointer  points  to  a  node  in  the  auxiliary  counter. 

Preprocessing  takes  0(f2(|w|))  +  0(|tu|)  =  0(*2(M))  time.  Figure  4.4  shows  the  result  of 
initialization  and  preprocessing  (not  all  pointers  or  nodes  are  shown). 

3.  Simulation: 

To  begin  the  simulation,  M  resets  the  execution  and  counter  pointers  to  the  beginnings  of 
the  instruction  list  and  counter,  respectively.  M  also  resets  the  input  tape  head  to  the  leftmost 
nonblank  tape  cell.  M  maintains  a  simulation  arena  to  keep  track  of  the  simulation,  so  M 
creates  a  node  to  serve  as  the  initial  node  of  Mw  in  the  simulation  arena. 
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alphabet  nodes 


-*0-0-0-  counter 


Figure  4.4:  Preparation  for  simulating  Mw 


Using  the  instruction  pointer,  M  accesses  the  first  instruction  node,  decodes  the  instruction, 
and  executes  the  instruction  in  the  simulation  arena.  M  then  accesses  the  next  instruction  node 
by  following  the  successor  or  jump  pointer,  as  appropriate.  It  continues  in  this  manner,  decoding 
instructions  and  executing  them  in  the  simulation  arena,  until  it  reaches  an  accept  or  reject 
instruction  node. 

The  initial  node  serves  as  the  point  of  reference  for  the  simulation.  It  has  pointers  to 
the  current  instruction  node,  the  current  counter  node,  and  the  current  center  of  Mw  in  the 
simulation  arena.  There  is  a  pointer  from  every  node  in  the  A-structure  to  the  initial  node  so 
that  it  may  always  be  referenced. 

For  each  simulated  instruction,  M  moves  the  counter  pointer  across  k(Mw)  nodes  of  the 
counter,  using  the  k(Mw )  nodes  of  the  auxiliary  counter.  Since  M  has  a  special  pointer  to  the 
last  counter  node,  M  can  compare  the  special  pointer  with  the  counter  pointer  to  determine 
whether  t2(\w\)/k(Mw)  stePs  have  elapsed.  If  so,  then  M  rejects  w.  Otherwise,  M  accepts  w  if 
and  only  if  Mw  rejects  w. 

Since  the  length  of  each  encoded  instruction  of  Mw  is  at  most  k(Mw),  M  can  simulate  each 
step  of  Mw  in  c'k{Mw)  steps,  for  constant  c!  depending  only  on  M .  For  each  step,  M  counts 
off  k(Mw)  of  the  t2(M)  counter  nodes;  therefore,  M  accepts  L  in  time  0(f2(lu’|))-  □ 


4.5  Nondeterministic  Pointer  Machines 

Using  space  equivalence  for  pointer  machines  and  Turing  machines,  we  obtain  a  result  for  pointer 
machines  based  on  Savitch’s  (1970)  result  comparing  deterministic  and  nondeterministic  Turing 
machine  space: 
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Theorem  4.15  For  s(n)  >  logn,  every  nondeterministic  pointer  machine  of  capacity  complex¬ 
ity  s(n)  can  be  simulated  by  a  deterministic  pointer  machine  of  capacity  complexity  0{s(n)2). 

Proof. 

NPM-CAPACITY(s)  C  NTM-SPACE(s)  (Theorem  4.5) 

C  TM-SPACE(.s2)  (Savitch,  1970) 

C  PM-CAPACITY(0(s2))  (Theorem  4.4) 

□ 

Corollary  4.16  For  s(n)  >  (log n)/log logn,  every  nondeterministic  pointer  machine  of  mass 
complexity  s(n)  can  be  simulated  by  a  deterministic  pointer  machine  of  mass  complexity 
0(s(n)2  log  s(n)). 

Proof. 

NPM-MASS(s)  =  NPM-CAPACITY(0(slogs))  (Proposition  2.5) 

C  PM-CAPACITY(0(s2(logs)2))  (Savitch,  1970;  slogs  >  log n) 

=  PM-MASS(0(.s2logs))  (Proposition  2.5) 

□ 

4.6  Alternating  Pointer  Machines 

One  result  of  Chandra  el  al.  (1981)  is  that  every  alternating  Turing  machine  running  in  time 
t  can  be  simulated  by  a  deterministic  Turing  machine  using  space  t.  The  following  theorem 
extends  that  result  to  pointer  machines.  It  strengthens  the  result  of  Halpern  et  al.  (1986) 
that  every  deterministic  pointer  machine  running  in  time  t  can  be  simulated  by  a  deterministic 
pointer  machine  using  0(t/\ogt)  nodes. 

Theorem  4.17  Every  alternating  pointer  machine  running  in  time  t(n)  can  be  simulated  by  a 
deterministic  pointer  machine  using  t(n)/\ogt(n)  nodes. 
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Proof.  We  construct  deterministic  pointer  machine  M'  that  simulates  alternating  pointer 
machine  M.  Without  loss  of  generality,  assume  that  M  alternates  strictly  between  universal 
and  existential  states  at  each  step,  and  that  at  every  step  there  are  exactly  two  choices.  We 
first  describe  how  the  simulation  is  accomplished  with  O(t)  nodes,  and  then  we  show  how  to 
reduce  this  to  0(t /log  t).  The  simulation  does  not  require  t  or  t/logt  to  be  constructible:  M' 
begins  by  assuming  4  =  1;  when  more  nodes  are  necessary,  M'  begins  the  simulation  with  the 
value  of  t  incremented  by  1.  In  this  way,  M'  uses  the  minimum  number  of  nodes  necessary. 

M'  begins  by  creating  a  chain  of  t  computation  nodes  and  two  additional  nodes  designated  i 
and  r.  Using  the  computation  node  chain  M'  performs  a  depth-first  traversal  of  the  computation 
tree  of  M  in  a  manner  analogous  to  the  proof  of  Theorem  3.2  of  Chandra  et  al.  (1981).  Each 
computation  node  records  a  choice  made  by  M  on  a  particular  branch  of  the  computation  tree 
with  a  choice  pointer  to  the  t  or  the  r  node,  indicating  a  left  or  right  branch,  respectively.  M' 
initially  sets  the  choice  pointer  of  every  computation  node  to  the  t  node. 

For  each  branch  of  the  computation  tree  of  M,  M'  uses  the  recursive  simulation  procedure 
SIMULATE  of  Halpern  et  al.  (1986),  referring  to  the  computation  node  chain  to  determine  the 
proper  instruction  sequence.  SIMULATE  uses  O(t/\ogt )  nodes.  When  SIMULATE  terminates 
for  any  particular  branch  of  the  computation  tree,  M'  backtracks  along  the  computation  node 
chain,  resetting  pointers  appropriately  to  specify  the  next  branch  to  be  considered. 

This  simulation  uses  ct  nodes,  for  some  constant  c,  since  t  computation  nodes  are  required. 
The  chain  of  t  computation  nodes  is  equivalent  to  a  Turing  machine  tape  of  t  cells,  each  of 
which  holds  l  or  r.  We  reduce  the  number  of  nodes  to  0{t/\ogt)  by  encoding  the  chain  using 
the  tree  and  blockset  in  the  proof  of  Theorem  4.2  (see  Figure  4.2). 

By  Theorem  4.1,  we  can  simulate  the  alternating  pointer  machine  using  t /log  <  nodes.  □ 
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Corollary  4.18  Every  alternating  pointer  machine  running  in  time  t(n)  can  be  simulated  by 
a  deterministic  Turing  machine  using  space  t(n). 

Proof. 

APM-TIME(t)  C  PM-MASS(£/log  t)  (Theorem  4.17) 

=  TM-SPACE(0  (Theorem  4.5) 

□ 

We  can  obtain  other  results  based  on  the  theorems  of  Chandra  et  al.  (1981). 

Theorem  4.19  For  s(n)  >  n,  every  nondeterministic  pointer  machine  of  capacity  complexity 
s(n )  can  be  simulated  by  an  alternating  pointer  machine  in  time  0(s(n)2). 

Proof. 

NPM-CAPACITY(s)  C  NTM-SPACE(^)  (Theorem  4.5) 

C  ATM-TIME(0(s2))  (Chandra  et  al.,  1981) 

C  APM-TIME(0(.s2))  (Theorem  4.2) 

□ 

Corollary  4.20  For  s(n)  >  n/log-t,  every  nondeterministic  pointer  machine  of  mass  complex¬ 
ity  s(n)  can  be  simulated  by  an  alternating  pointer  machine  in  time  0((s(n)  log  s(n))2). 

Proof 

NPM-MAS S(s)  =  NPM-CAPACITY(0(.s logs))  (Proposition  2.5) 

C  APM-TIME(0((.slog.s)2))  (Theorem  4.19) 

□ 

Theorem  4.21  For  s(n)  >  logn,  every  alternating  pointer  machine  of  capacity  complexity 
s(n)  can  be  simulated  by  a  deterministic  pointer  machine  in  time  0(1  )3("). 
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Proof. 


APM-CAPACITY(s)  C  ATM-SPACE(s)  (Theorem  4.5) 

C  TM-TIME(0(1)1)  (Chandra  et  a/.,  1981) 

C  PM-TIME(0(1)5)  (Theorem  4.2) 

□ 

Corollary  4.22  For  s(n)  >  (logn)/loglog  n,  every  alternating  pointer  machine  of  mass  com¬ 
plexity  s(n)  can  be  simulated  by  a  deterministic  pointer  machine  in  time  s(n)°^s^n^ . 

Proof. 

NPM-MASS (s)  =  NPM-CAPACITY(0(.s log 5))  (Proposition  2.5) 

C  PM-TIME(0(l)°(4lo«a)))  (Theorem  4.21) 

=  PM-TIME(s°(J)) 

o 
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Chapter  5 


Optimal  Simulation  of  Tree  Machines  by  Random  Access 

Machines 


We  present  an  optimal  on-line  simulation  of  a  tree  machine  of  time  complexity  t  by  a  log-cost 
RAM  of  time  complexity  0((t  log  i)/loglog  t ).  This  result  is  a  complement  to  Loui’s  (1983)  sim¬ 
ulation  of  tree  machines  by  multidimensional  Turing  machines  and  Reischuk’s  (1982)  simulation 
of  multidimensional  Turing  machines  by  tree  machines.  We  begin  by  exhibiting  a  real-time  sim¬ 
ulation  of  a  tree  machine  by  a  unit-cost  RAM. 

5.1  Simulation  by  Unit-cost  RAMs 

Theorem  5.1  Every  tree  machine  can  be  simulated  by  a  unit-cost  RAM  in  real-time. 

Proof  sketch.  We  design  a  unit-cost  RAM  R  that  simulates  tree  machine  T  with  worktape 
W .  R  has  a  contents  memory,  a  parent  memory,  and  several  working  registers.  Let  contents(x) 
(respectively,  parent(x))  be  the  register  with  address  x  in  the  contents  (respectively,  parent) 
memory.  Contents(x)  at  address  x  contains  the  contents  of  cell(x)  at  location  x  in  the  worktape 
of  T.  If  cell(x)  is  visited  by  T,  then  parent(x)  contains  the  worktape  location  of  the  parent  of 
cell(x).  The  working  registers  are  used  as  temporary  storage  and  to  keep  track  of  which  cell  is 
currently  accessed  by  T. 

R  simulates  one  step  of  T  with  a  constant  number  of  accesses  to  the  two  memories  and  the 
working  registers.  For  example,  if  the  head  moves  from  cell(x)  to  a  child  of  cell(x),  then  R 
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computes  location  2x  for  the  left  child  or  2x  +  1  for  the  right  child  with  one  or  two  additions 
and  stores  x  in  parent( 2x)  or  parent(2x  +  1).  Thus  to  simulate  t  steps  of  T  takes  O(t)  time  on 

R. 

5.2  Simulation  by  Log-cost  RAMs 
5.2.1  Upper  Bound 

Using  the  simulation  in  Theorem  5.1,  we  can  show  that  every  tree  machine  can  be  simulated  on¬ 
line  by  a  log-cost  RAM  in  time  0(t  log  t );  however,  we  describe  below  a  more  efficient  simulation 
by  log-cost  RAMs. 

For  simplicity,  we  consider  tree  machines  with  only  one  tree  worktape,  but  our  results  gen¬ 
eralize  to  multiple  worktapes.  Let  T  be  a  tree  machine  of  time  complexity  t  with  one  worktape 
W.  We  show  that  there  is  a  RAM  R  that  simulates  T  on-line  in  time  O((tlogt)/loglogt)- 
Since  this  is  an  on-line  simulation,  we  do  not  know  n  or  t(n )  ahead  of  time.  To  solve  this 
problem,  we  use  a  technique  of  Galil  (1976),  adopted  by  Loui  (1983;  1984a)  and  Katajainen  et 
al.  (1988).  Let  t'  be  the  elapsed  time  of  T  (as  recorded  by  R)  and  let  te  be  R's  current  estimate 
of  the  total  running  time  of  T .  R  begins  the  simulation  with  te  =  2.  When  t  exceeds  fe,  R 
doubles  te  and  restarts  the  entire  simulation.  R  continues  this  process  of  doubling  te  whenever 

V  exceeds  te  until  the  simulation  is  finished.  R  records  the  input  in  a  separate  memory  as 
described  below  so  that  for  each  value  of  te  >  2,  it  is  unnecessary  to  move  the  input  head  until 

V  >  tj 2.  We  show  that  for  each  value  of  te,  the  time  of  the  simulation  is  0(<e(log<e)/log  log  te). 

It  is  easy  to  show  that  the  sum  of  the  simulation  times  for  all  values  of  te  is  0(t’( log  O/log  log  t'). 

We  first  provide  a  brief  description  of  the  simulation.  We  choose  parameters  h  and  u  such 
that  u  =  22/l+2  -  1.  We  specify  the  values  of  h  and  u  later.  R  has  several  memories.  R  maintains 
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in  the  main  memory  the  entire  contents  of  W.  The  main  memory  represents  W  as  overlapping 
subtrees,  called  blocks.  R  represents  the  contents  of  each  block  Wx  in  one  register  rx  of  the 
main  memory.  When  the  worktape  head  is  in  a  particular  block  Wz,  R  represents  Wx  in  the 
cache  memory.  Step-by-step  simulation  is  carried  out  in  the  cache,  which  represents  the  block 
Wx  in  breadth-first  order,  one  cell  of  Wx  per  register  of  the  cache. 

Because  blocks  overlap,  when  the  worktape  head  exits  Wx ,  it  is  positioned  in  the  middle  of 
some  other  block  Wy.  At  this  time  R  packs  the  contents  of  the  cache  back  into  rx  in  the  main 
memory  and  unpacks  the  contents  of  ry  into  the  cache. 

The  details  of  the  simulation  follow. 

Let  W[x,s]  denote  the  complete  subtree  of  W  of  height  s  rooted  at  cell(x).  A  block  is  any 
subtree  Wx  =  W[x,2h  +  1]  such  that  the  depth  of  cell(x)  is  a  multiple  of  h  -f  1.  Since  a  block 
has  height  2h  -(-  1,  it  contains  22h+2  —  1  =  u  cells.  Let  the  relative  location  of  a  cell  within  a 
block  be  defined  in  a  manner  similar  to  the  location  of  a  cell,  where  the  relative  location  of  the 
root  of  the  block  is  1,  the  relative  locations  of  its  children  are  2  and  3,  and  so  on. 

Call  a  block  Wp  the  parent  block  of  Wx  if  cell(p)  is  the  ancestor  of  cell(x)  at  distance  h  +  1 
from  cell(x).  If  Wx  is  the  parent  block  of  Wc,  then  Wc  is  a  child  block  of  Wx.  Each  block  has 
2h+1  child  blocks.  The  topmost  block  of  W,  which  contains  the  root  of  \V,  is  called  the  root 
block. 

Define  the  top  half  of  a  block  Wx  as  W[x,h],  and  define  the  bottom  half  of  IVX  as  the 
remaining  cells  of  the  block.  Note  that  the  top  half  of  the  block  Wx  is  part  of  the  bottom  half 
of  \VP,  its  parent  block,  so  that  the  blocks  overlap.  Call  the  portion  of  YVX  shared  by  Wp  (i.e., 
the  subtree  W[x,/i])  the  common  subtree  of  Wx  and  Wp. 
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R  precomputes  in  separate  memories  two  tables,  half  and  translate.  We  explain  later  how 
R  uses  these  tables.  Here  we  describe  their  contents  and  how  they  are  computed.  Let  half(z) 
(respectively,  translate(z ))  be  the  register  in  half  (respectively,  translate )  at  address  z. 

Half(z)  contains  [z/2 J.  To  compute  half,  for  z  =  l,...,u/2,  R  stores  z  in  half(2z)  and 
half(2z  +  1). 

For  z  =  2 2h+1 , . . .  ,u,  translate(z)  contains  (z  mod  2/l+1)  +  2/l+1.  R  never  refers  to  any 

register  in  translate  with  address  less  than  22h+1 .  Translate  is  computed  as  follows: 
i  :=  2h+l 

for  z  =  22k+1  to  u  do 
translate(z )  :=  i 

i  :=  i  +  1 

if  t  =  22h+2  then  i  :=  2A+1 

end 

We  now  show  how  R  simulates  the  tree  machine  using  the  cache.  Assume  the  head  of  T  is 
currently  scanning  a  cell  in  block  Wx.  Let  cache(z)  be  the  register  in  the  cache  with  address 
z  and  let  cell(x,z)  be  the  cell  in  Wx  with  relative  location  z.  For  each  z  =  l,...,u,  register 
cache(z)  contains  the  bit  in  cell(a:,z);  for  example,  cache(l)  contains  the  contents  of  cell(x,l) 
=  cell(x),  the  root  of  Wx.  Thus  R  uses  u  registers  of  the  cache,  each  register  containing  one 
bit. 

While  the  head  of  T  remains  in  Wx,  R  keeps  track  of  the  head’s  location  with  the  cache 
address  register  in  the  working  memory,  a  memory  maintained  by  R  for  storing  information 
necessary  for  miscellaneous  tasks.  If  the  cache  address  register  contains  z,  then  cell(x,z)  is 
currently  being  accessed  in  T. 

To  simulate  a  tree  machine  operation  at  cell(x,z),  R  loads  the  contents  (one  bit)  of  cache(z) 
into  AC.  Once  the  contents  are  in  AC,  R  simulates  one  step  of  T  by  storing  either  0  or  1  in 
cache(z). 
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If  the  head  of  T  moves  to  a  child  of  cell(a:,2:),  then  the  new  address  for  the  cache  address 
register,  as  well  as  the  relative  location  of  the  new  block  cell  being  read,  is  either  2z  or  2z  +  1. 
With  one  or  two  additions,  R  computes  this  new  address  and  places  it  in  the  cache  address 
register.  When  the  head  of  T  moves  to  the  parent  of  cell(x,2),  the  address  of  the  corresponding 
cache  register  is  \z/ 2J.  Because  R  has  no  division  operation,  it  accesses  table  half  to  retrieve 
the  new  address  in  cache. 

To  describe  what  happens  when  the  worktape  head  moves  out  of  the  current  block,  we  first 
show  how  the  blocks  are  stored  in  main  memory.  Main  memory  is  divided  into  pages  consisting 
of  2/l+1  +3  registers  each.  A  page  corresponds  to  a  visited  block  of  W .  Let  page(x)  be  the  page 
representing  Wx.  Define  the  address  of  a  page  to  be  the  address  of  the  first  register  in  the  page. 
The  first  register  in  page(x)  is  the  contents  register.  For  the  page  representing  the  root  block, 
the  contents  register  contains  the  entire  contents  of  that  block.  For  every  other  block  Wy,  the 
contents  register  contains  the  contents  of  the  bottom  half  of  Wy.  The  contents  of  cells  in  a 
block  are  kept  in  breadth-first  order;  i.e.,  reading  the  binary  string  in  the  contents  register  from 
left  to  right  is  equivalent  to  reading  the  bottom  half  of  the  block  it  represents  in  breadth-first 
order.  Initially,  all  cells  of  a  block  contain  0,  so  all  contents  registers  initially  contain  0. 

Following  the  contents  register  is  the  rank  register ,  containing  a  number  t  between  1  and 
2h+1  indicating  that  \VX  is  the  Cth  child  of  its  parent  block.  The  next  register  is  the  parent 
register ,  containing  the  address  of  the  page  representing  the  parent  block  of  Wx.  The  next  2h+l 
registers  are  the  child  registers  of  Wx.  The  mth  child  register  of  page(x)  contains  the  address 
of  the  page  representing  the  mth  child  block  of  Wx  or  0  if  that  child  block  has  not  been  visited 
(see  Figure  5.1). 


TREE  WORKTAPE 


Wx  depth  (j)(h  +  1) 


MAIN  MEMORY 


“  bottom  half  of  Wx  ” 


“  bottom  half  of  Wc  ” 


1 


a 


contents 

rank 

parent 

child  1 


contents 

rank 

parent 


Figure  5.1:  Worktape  W  (head  moves  from  Wx  to  VFC) 


\page(p) 


Figure  5.2:  Updating  page(p)  in  main  memory 

The  first  page  in  main  memory  corresponds  to  the  root  block.  Blocks  are  then  stored  in 
the  order  in  which  they  are  visited.  The  page  address  register ,  a  register  in  working  memory, 
contains  the  address  of  the  page  in  main  memory  corresponding  to  the  currently  accessed  block. 

Let  Wx  be  the  currently  accessed  block  and  let  Wp  be  the  parent  block  of  Wx.  When  the 
tree  worktape  head  moves  out  of  Wx  so  that  it  is  positioned  in  the  middle  of  a  child  block  Wc, 
R  makes  the  proper  changes  to  main  memory  and  loads  the  cache  from  the  contents  register  of 
page{c). 

In  main  memory,  R  updates  the  contents  registers  of  page(x)  and  page(p).  To  update 
page(x),  R  packs  the  contents  of  the  registers  of  the  cache  which  correspond  to  the  bottom  half 
of  Wx  into  a  single  register  in  working  memory  (call  it  the  transfer  register ,  denoted  by  tr ).  R 
then  copies  tr  into  the  contents  register  of  page(x)  via  AC  (see  Figure  5.2). 

Updating  page(p)  consists  of  changing  the  bits  of  its  contents  register  corresponding  to  the 
common  subtree  of  \VX  and  Wv.  R  first  saves  the  contents  of  the  cache  that  encode  the  common 
subtree  of  Wx  and  1FC  in  a  portion  of  working  memory,  since  this  information  is  needed  in  the 
cache  as  the  top  half  of  \VC.  R  also  saves  the  contents  of  the  cache  that  encode  the  common 
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subtree  of  Wx  and  Wp.  R  then  loads  the  contents  register  of  page(p)  into  tr  and  unpacks  the 
contents  into  the  cache.  The  bits  in  working  memory  corresponding  to  the  common  subtree  of 
Wx  and  Wp  are  then  written  into  their  proper  locations  in  the  portion  of  the  cache  representing 
the  bottom  half  of  Wp.  R  then  packs  the  contents  of  the  cache  into  tr  and  copies  tr  into  the 
contents  register  of  page(p). 

R  then  determines  whether  Wc  has  been  visited  before  by  checking  the  contents  of  the  child 
register  of  page(x)  corresponding  to  Wc.  If  the  child  register  contains  a  valid  (i.e.,  nonzero) 
address,  then  R  uses  that  address  to  access  page(c).  R  then  unpacks  the  contents  register  of 
page(c)  into  the  cache.  This  action  is  similar  to  the  manipulation  of  page(p)  discussed  above. 
R  loads  the  contents  of  the  common  subtree  of  Wx  and  Wc  saved  in  working  memory  into  the 
registers  of  the  cache  representing  the  top  half  of  the  block. 

If  the  child  register  of  page(x)  contains  0,  then  R  allocates  a  new  page  to  maintain  the 
information  on  Wc. 

R  modifies  the  page  address  register  to  reflect  the  fact  that  the  worktape  head  is  now 
scanning  block  Wc.  The  address  currently  in  this  register  is  that  of  page(x).  R  writes  the 
address  of  page(c)  in  main  memory  to  the  page  address  register.  R  determines  from  the  cache 
address  register  the  quantity  t  such  that  Wc  is  the  tth  child  of  Wx.  Then  by  accessing  the  ith 
child  register  of  page(x)  in  the  main  memory,  R  can  determine  the  address  of  page(c). 

To  modify  the  cache  address  register  to  reflect  the  relative  location  of  the  head  within  block 
Wc,  R  first  translates  the  relative  location  of  the  leaf  cell(j,z)  in  Wx  into  its  relative  location 
in  Wc,  Since  leaf  cell(z,z)  in  Wx  is  the  same  as  cell(c,(z  mod  2A+1)  +  2A+1)  in  Wc,  R  uses 
the  table  translate  described  above.  Using  one  or  two  additions,  R  then  calculates  the  relative 
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location  in  Wc  of  this  cell’s  left  or  right  child,  depending  on  which  branch  the  worktape  head 
used  to  exit  Wx.  R  then  writes  this  new  relative  location  into  the  cache  address  register. 

A  similar  sequence  of  operations  occurs  if  the  worktape  head  moves  out  of  a  block  (and 
farther)  into  its  parent  block  instead  of  into  a  child  block.  Then  R  uses  the  parent  register  to 
determine  the  address  of  the  page  representing  the  parent  block,  and  R  uses  the  rank  register 
to  determine  the  relative  location  of  the  worktape  head  within  the  parent  block. 

As  described  earlier,  R  maintains  an  estimate  te  of  the  total  running  time  of  T.  R  doubles 
te  whenever  the  elapsed  time  exceeds  te  and  restarts  the  simulation  with  this  new  value.  The 
portion  of  the  input  string  read  by  T  up  to  time  <e/2  is  maintained  in  R' s  input  memory  in 
registers  of  length  h.  Input  symbols  read  from  time  1  to  time  h  are  contained  in  the  first  register 
of  input  memory,  those  read  from  time  h- f  1  to  time  2 h  are  contained  in  the  second  register,  etc. 
Each  register  is  unpacked  into  the  input  cache  at  its  appropriate  time,  and  the  input  symbols 
are  read  by  R.  After  te/2  steps  of  the  tree  machine  have  been  simulated,  input  is  read  from  the 
input  tape.  This  new  input  is  stored  in  the  same  manner  as  previous  input. 

When  it  is  necessary  to  restart  the  simulation  with  a  new  value  of  t,  R  reorganizes  the  input 
memory  using  packs  and  unpacks  so  that  register  lengths  reflect  the  updated  value  of  h. 

To  simulate  tree  machines  with  more  than  one  worktape,  R  maintains  a  main  memory,  a 
cache,  and  a  working  memory  for  each  worktape. 

By  evaluating  the  cost  of  the  simulation  on  a  log-cost  RAM,  we  derive  the  following  result. 

Theorem  5.2  Every  tree  machine  running  in  time  t(n)  can  be  simulated  on-line  by  a  log-cost 
RAM  running  in  time  0((t(n)  log t(n))/log  log  t(n)). 

Proof.  Because  the  blocks  have  height  2 h  +  1  and  overlap  by  height  h  -f  1,  whenever  the 
w'orktape  head  moves  out  of  a  block,  it  is  exactly  in  the  middle  of  another  block;  i.e.,  T  will  take 
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at  least  h'  =  h  4-  1  steps  before  its  worktape  head  exits  this  new  block.  Since  the  tree  machine 
computation  has  at  most  t  steps,  the  work  of  updating  main  memory  from  cache  (packing), 
loading  a  new  block  into  the  cache  (unpacking),  and  directly  simulating  h'  steps  is  performed 
at  most  t/h'  times. 

Updating  main  memory  and  loading  a  new  block  in  cache  involve  the  pack  and  unpack 
operations  and  a  constant  number  of  accesses  to  main  memory.  Registers  in  main  memory  have 
addresses  no  larger  than  (t  /  h')(2fl+l  +  3).  Thus  accesses  to  main  memory  take  time  0(\ogt  +  h). 

By  Lemma  2.3,  the  time  for  the  pack  and  unpack  operations  is  0(u log  u).  By  Lemma  2.4, 
the  time  to  create  the  tables  necessary  for  these  operations  is  0{u 2U).  The  time  to  compute 
tables  half  and  translate  is  O(ulogu). 

Simulating  one  step  of  the  tree  machine  consists  of  a  constant  number  of  accesses  to  cache, 
taking  time  O(logu).  Thus  simulating  h'  steps  takes  time  O(h'logu). 

Simulating  h  input  operations  (those  up  to  step  t/2)  takes  time  O(h\ogh).  Recording  h 
input  operations  (those  past  step  t/2)  also  takes  time  0(h\ogh).  Packing  and  unpacking  takes 
time  0(h  log  h).  Thus  the  time  to  simulate  t/2  input  operations  and  record  t/2  additional  input 
operations  is  (t/h)0(h  log  h).  Reconfiguring  the  input  memory  for  a  new  value  of  t  also  takes 
time  (t/h)0(h\ogh).  Building  the  necessary  tables  for  input  simulation  and  recording  takes 
time  0{h2h). 

The  total  time  required  for  R,  then,  is 

(t/h')(0{\og  t  +  h)  +  0(u  log  u)  +  0(h'  log  u))  +  0(u2u)  +  t  log  h. 

Since  h  =  O(logu),  the  total  time  is 

0(((<  log  t)/log  u)  +  tu  -f  t  log  u  +  u2u). 


Choose  h  so  that  u  =  (log  t)/loglog  t.  Then  the  total  time  for  the  simulation  is  0((t  logt)/loglogt). 

□ 

5.2.2  Lower  Bound 

We  now  show  that  the  time  bound  of  Theorem  5.2  is  optimal  within  a  constant  factor.  We 
begin  with  an  overview  of  Kolmogorov  complexity,  which  we  use  to  prove  the  lower  bound. 

Let  a  and  r  be  strings  in  {0,1}’,  and  let  U  be  a  universal  Turing  machine.  Define  the 
Kolmogorov  complexity  of  a  given  r  with  respect  to  I/,  denoted  K(ct\t),  as  follows:  let  #  be 
a  symbol  not  in  (0, 1}’;  then  K{a\r)  is  the  length  of  /?  where  (3  is  the  shortest  binary  string 
such  that  U((3#t)  =  a.  Informally,  K(o\t)  is  the  length  of  the  shortest  binary  description  of 
cr,  given  r.  If  r  is  the  empty  string,  then  we  write  K(cr)  for  K(ct\t). 

We  say  a  string  a  is  incompressible  if  K{a)  >  |oj.  Note  that  for  all  n  there  are  2n  binary 
strings  of  length  n ,  but  there  are  only  2n  -  1  strings  of  length  less  than  n.  Thus  for  all  n,  there 
is  at  least  one  incompressible  string  of  length  n. 

A  useful  concept  in  Kolmogorov  complexity  is  the  self-delimiting  string.  For  natural  number 
n,  let  bin(n)  be  the  binary  representation  of  n  without  leading  0’s.  For  binary  string  w,  let  w  be 
the  string  resuiting  from  placing  a  0  between  each  pair  of  adjacent  bits  in  w  and  adding  a  1  to 
the  end.  Thus  110  =  101001.  We  call  the  string  bin(\w\)w  the  self-delimiting  version  of  w.  The 
self-delimiting  version  of  w  has  length  jiuj  +  2 log(|u>|  +  1).  When  we  concatenate  several  binary 
string  segments  of  differing  lengths,  we  can  use  self-delimiting  versions  of  the  strings  so  that 
we  can  determine  where  one  string  ends  and  the  next  string  begins  with  little  additional  cost 
in  the  length  of  the  concatenated  string.  Note  that  in  such  a  concatenation  it  is  not  necessary 
to  use  a  self-delimiting  version  of  the  last  string  segment. 
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Kolmogorov  complexity  has  recently  gained  popularity  as  a  method  for  proving  lower 
bounds.  Li  and  Vitanyi  (1988)  provide  a  thorough  summary  of  lower  bound  (and  other 
complexity-related)  results  obtained  using  Kolmogorov  complexity. 

Theorem  5.3  There  is  a  tree  machine  T  running  in  time  n  such  that  for  any  log-cost  RAM 
R,  R  requires  time  t(n )  =  Q((n  log  n)/log  log  n)  to  simulate  T  on-line. 

Proof.  Tree  machine  T  has  one  tree  worktape  and  operates  in  real  time.  T's  input  alphabet 
is  a  set  of  commands  of  the  form  (e,  ip),  where  e  €  {0, 1,?}  and  ip  indicates  whether  the  worktape 
head  moves  to  a  child  or  parent  of  the  current  cell  or  remains  at  the  current  cell.  Suppose  T 
is  in  a  configuration  in  which  the  cell  x  at  which  the  worktape  head  is  located  contains  e'.  On 
input  (e,  ip),  machine  T  writes  e'  onto  its  output  tape,  and  the  worktape  head  writes  e  onto  cell 
x  if  e  €  {0, 1),  but  it  writes  e'  (the  current  contents  of  x)  onto  x  if  e  =?.  At  the  end  of  the  step 
the  worktape  head  moves  according  to  ip.  For  every  n  that  is  a  sufficiently  large  power  of  2,  we 
construct  a  sequence  of  n  tree  commands  for  which  R  requires  time  fi((n  log  n)/Ioglog  n).  As 
in  (Loui,  1983),  the  string  of  tree  commands  is  divided  into  a  filling  part  of  length  n/2  and  a 
query  part  of  length  n/2. 

Let  W  be  the  worktape  of  T,  and  let  xo  be  the  root  of  W .  Let  d  =  log(n/8).  Denote  the 
complete  subtree  of  W  of  height  d  whose  root  is  xo  by  Wj.  Let  N  =  n/8.  We  consider  the 
complexity  of  the  simulation  in  terms  of  N . 

We  fill  VVj  with  an  incompressible  string  r  of  length  2 N  -  1  such  that  r  can  be  retrieved  by 
a  depth-first  traversal  of  W 4.  This  takes  time  4jV  -  4  on  T.  We  move  the  worktape  head  four 
more  times  (without  writing)  so  that  the  total  length  of  the  filling  part  is  n/2. 

The  query  part  consists  of  a  series  of  questions.  A  question  is  a  string  of  2d  =  2  log  A  tree 
commands  that  causes  the  worktape  head  to  move  from  the  root  xq  of  the  tree  worktape  to  a 
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cell  at  depth  d  and  back  to  xo  without  changing  the  contents  of  the  worktape.  As  the  head  visits 
each  cell  during  a  question,  T  outputs  the  contents  of  that  cell.  T  processes  2A/log  N  questions 
Q\,Q2,---  during  the  query  part.  Thus  the  query  part  takes  time  4N  =  n/2.  We  show  that 
after  each  question  Qj,  there  is  a  question  Qj+i  such  that  R  takes  time  D((log2  jV)/log  log  N) 
to  process  Qj+L,  and  Theorem  5.3  follows. 

Assume  that  R  has  just  processed  question  Qj.  Let  P(N)  be  the  maximum  time  nec¬ 
essary  to  process  any  possible  next  question.  We  show  that  some  next  question  takes  time 
fi((log2  A)/logP).  Consequently,  by  definition,  P  =  ft((log2  A)/log  P).  To  determine  a  lower 
bound  on  P,  we  consider  two  cases: 

(1)  P  <  log2  N;  hence,  logP  <  2  log  log  A.  Thus  we  have  the  following: 

P  >  c(log2  Aj/logP,  for  some  constant  c  (since  P  =  D((log2  jV)/log  P) 

>  c(log2  A)/(2loglogAr); 

(2)  P  >  log2  N. 

In  either  case,  P  =  Q((log2  A)/log  log  N). 

We  first  determine  t,  the  sum  over  all  possible  next  questions  q,  of  the  time  required  for  R 
to  process  q. 

Divide  worktape  W  into  5  =  (log  JV)/(21og  P)  sections,  each  of  height  21ogP.  For  s  = 
0, 1, ...  ,5  —  1,  there  are  P2*+2  exit  points  ( bottom  celts)  in  section  s.  We  refer  to  any  initial 
segment  of  a  question  as  a  partial  question  and  the  portion  of  the  question  that  is  processed 
while  the  worktape  head  is  in  one  section  as  a  subquestion  (see  Figure  5.3).  To  compute  t, 
we  compute  for  s  =  fi,  1,...,5  —  1  the  total  time  ts  required  for  R  to  process  all  possible 
subquestions  in  section  s.  Since  the  depth  of  Wj  is  log  A,  there  are  N  possible  next  questions. 
Each  of  the  P2j+2  bottom  cells  of  section  s  is  visited  during  A/P2,+2  of  these  questions. 
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Xo 


Figure  5.3:  Processing  section  s  of  worktape  W 

Let  o,  be  the  string  defined  by  the  contents  of  the  bottom  cells  of  section  s,  from  left  to 
right;  clearly,  |a3|  =  P23+2. 

Lemma  5.4  The  string  o,  is  incompressible  up  to  a  term  of  0(s\og  P);  i.e.,  K(o,)  >  |<7S|  — 
0(3  log  P). 

Proof.  The  incompressible  string  r,  which  gives  the  contents  of  W,  can  be  specified  by  a 
string  composed  of  the  following  segments: 

1.  a  self-delimiting  string  encoding  this  discussion  (0(1)  bits) 

2.  a  self-delimiting  version  of  a  binary  string  of  length  K(<t3)  that  specifies  o ,  (h'(os)  + 
0(s  log  P)  bits) 

3.  self-delimiting  versions  of  the  values  of  s  and  P  (O(logs)  -f  0(log  P)  bits) 
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4.  a  string  specifying  the  bits  in  r  but  not  in  aa  (2 N  —  1  —  P2a+ 2  bits). 


Thus  K(t)  <  K(cr3)  +  (2 N  -  1  -  P2a+2)  +  0(5  log  P).  But  K(t)  >  2 N  -  1;  therefore,  K(as)  > 
F>2a+2  —  0(5 log P).  □  Lemma  5.4 


Lemma  5.5  If  £  >  1,  then  Elogi  >  (l/2)^log^. 


t 

r 

1=1 


Proof.  For  all  t  such  that  1  <  t  <  £,  clearly  (i  -  1)(£  -  i )  >  0;  hence  i(£  -  i  +  1)  >  £. 
Consequently, 

t  t 

Elog1'  =  (1/2)E(1°gi  +  1°g(^- !)) 


1=1 


1=1 

* 


=  (l/2)£log(*(<-<  +  l)) 


i=i 


>  (l/2)Elog^ 

1=1 

=  (l/2Klog  £. 


O  Lemma  5.5 


Lemma  5.6  For  s  =  1, 2, . . . ,  5  -  1,  the  maximum  number  of  distinct  registers  accessed  during 
the  processing  of  all  partial  questions  through  section  s  -  1  is  at  most  4P2s+1  /\og  P . 

Proof.  Let  C  =  4P/\ogP.  By  Lemma  5.5,  for  P  sufficiently  large,  X^jlogi  >  P.  The 
prc  essing  of  each  partial  question  through  section  s  -  1  could  involve  no  more  than  C  distinct 
registers;  otherwise,  because  of  the  total  cost  of  addresses  of  registers,  R  would  exceed  time 
P  for  some  next  question.  There  are  P2’  different  partial  questions  possible  through  section 
s  -  1,  so  there  are  no  more  than  4P2*  rl/log  P  distinct  registers  accessed  for  all  possible  partial 
questions.  □  Lemma  5.6 
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Let  us  consider  a  particular  section  s.  Let  iq,  r2, . . . ,  rm  be  the  registers,  in  order  of  increas¬ 
ing  address,  that  R  accesses  to  produce  the  same  output  that  T  produces  when  its  worktape 
head  is  in  section  s,  excluding  those  registers  accessed  to  process  partial  questions  through 
section  s  -  1.  The  address  of  r,  is  at  least  i.  To  compute  a  lower  bound  on  ta,  we  assess  for 
each  t  the  contribution  to  t,  of  accessing  rq. 

To  determine  the  contribution  of  r,  to  ts,  we  calculate  the  minimum  number  of  possible 
questions  for  which  R  accesses  r,-.  For  every  bottom  ceil  v,  let  qv  be  the  subquestion  that 
causes  T  to  visit  cell  v  of  the  tree  worktape.  For  1  <  i  <  m,  let  X,-  be  the  set  of  bottom  cells 
x  of  section  s  such  that  x  £  X,  if  R  accesses  r,  to  process  qx  (see  Figure  5.3).  Thus  if  T  visits 
a  cell  in  X,  when  processing  a  question  in  section  s,  R  accesses  register  r,-  when  processing  the 
same  question.  We  say  that  r,  operates  on  the  bottom  cells  in  X,-.  Since  T  visits  one  cell  of  X, 
while  processing  one  of  N/P2s+ 2  possible  questions,  R  accesses  r,  during  the  processing  of  at 
least  |X,|(Ar/P23+2)  possible  questions. 

For  1  <  i  <  m,  the  total  access  time  for  register  r,  in  section  s  is  at  least  the  product  of 
logt  (since  the  address  of  rt-  is  at  least  i),  |X,-|  (the  number  of  bottom  cells  that  r,  operates 
on),  and  N/P23+ 2  (the  number  of  questions  during  which  one  of  these  bottom  cells  is  visited). 
Summing  the  time  incurred  by  access  to  each  register  yields: 

i,  >  f^(logz)\Xt\(N/P23+2).  (5.1) 

;=i 

Using  Lemma  5.8  below,  we  can  determine  a  lower  bound  for  t3,  but  we  first  introduce  the 
following  technical  lemma. 

Lemma  5.7  (Loui,  1984b  [Section  4])  Let  J  and  M  be  integers  such  that  M  >  J .  A  sorted 
J -member  subset  of  can  be  represented  with  no  more  than  2Jlog(A//J)  +  4J  +  2 

bits. 
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Let  h  =  (1/7 )P2i+1. 


Lemma  5.8  ^|X,|  >  (1/23)P2j+2. 

i=h 

Proof.  Assume  that  the  conclusion  is  false.  Then  operate  on  at  least 

(22/23 )P2j+2  bottom  cells  in  section  s.  We  can  specify  the  string  o ,  as  follows:  we  obtain 
the  bits  of  Xk,  •  •  •  ,Xm  explicitly.  We  obtain  the  other  bits  of  o,  by  simulating  R  on  each 

m 

partial  question  to  a  bottom  cell  of  section  s  not  in  (J  Xk .  On  each  such  partial  question,  R 

k=h 

uses  only  registers  j  and  registers  accessed  in  sections  l,...,s  —  1.  Thus  crs  can  be 

specified  with  a  string  composed  of  the  following  segments: 

1.  a  self-delimiting  string  encoding  the  program  of  R  and  this  discussion  (0(1)  bits) 

2.  self-delimiting  versions  of  the  addresses  and  initial  contents  of  registers  accessed  in  sections 
1, . . . ,  s-  1  (at  most  8P2,+2/log  P+0(s  log  P)  bits  -  by  Lemma  5.6,  at  most  AP2a+l  /log  P 
registers  are  required,  and  for  each  register,  the  contents  and  the  address  could  each  require 
P  bits.) 

3.  self-delimiting  versions  of  the  addresses  and  initial  contents  of  r1(. . .  ,rh-i  ((2/7 )P2s+2  + 
0(s  log  P)  bits) 

4.  a  string  specifying  positions  of  cells  in  Xk  for  k  >  h  (we  use  Lemma  5.7  with  J  = 
(1/23)P2j+2  and  M  =  />2a+2;  this  requires  at  most  (14/23)P25+2  bits.  The  encoding 
used  to  achieve  Lemma  5.7  is  such  that  the  beginning  and  end  of  this  string  can  easily  be 
determined.) 

5.  a  string  specifying  the  contents  of  cells  in  Xk  for  k  >  h  (at  most  (1/23)P25+2  bits). 
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This  means  that  the  number  of  bits  needed  to  specify  a,  is  at  most  (151/161  )P2a+2  + 
0(P2i+2/log  P)  <  P2s+2  -  O(slogP)  for  sufficiently  large  P.  Thus  we  have  a  contradiction  of 


Lemma  5.4. 


□  Lemma  5.8 


Thus  we  have: 

m 

t,  >  ^((logi)|X,|(Ar/P2s+2)  (Inequality  5.1) 

.=1 

m 

>£((logi)|*,|(JV/P2*+2)) 

t~h 

>(iv/p2s+2)(iogh)f;ix,i 

i=/ 1 

>  (iV/P2,+2)(log h)( l/23)P2a+2  (Lemma  5.8) 

>  (l/23)7V((2s  +  l)logP  -  log  7)  (definition  of  h) 


>  (l/23)7VslogP. 


Now  sum  t,  over  all  s  to  compute  a  lower  bound  for  t,  the  total  time  required  for  R  to 
process  all  possible  next  questions: 

5-1 

i  = 

5  =  0 

5-1 

>  £((l/23)fVslogP) 

5=0 

>  (l/23)Ar(log  P)((log2  N)/(4  log2  P)  -  0((logN)/logP)) 

>  (1/92 )((N  log2  IV)/ log  P  -  0(N  log  N)). 


Since  there  are  N  questions,  we  divide  t  by  N  to  derive  the  average  time  needed  by  R  to 
process  the  next  question,  fl((log2  Ar)/log  P).  Some  next  question  must  require  time  greater 
than  or  equal  to  this  average  time.  Since  P  is  the  maximum  time  for  some  next  question, 
P  >  fi((log2N)/logP);  hence,  P=  Q((log2  iV)/loglog  N). 
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Thus  for  each  question  Qj,  we  can  choose  a  next  question  Qj+i  that  takes  time 
f)((log2  iV)/loglog N).  Since  the  query  part  has  N/(2 log N)  questions,  our  choice  of  ques¬ 
tions  means  that  the  query  part  takes  time  t  =  (Ar/(21ogAr))n((log2  Ar)/loglog  jV))  = 
log  iV)/ log  log  N).  The  entire  simulation  takes  at  least  time  t.  Since  N  =  n/8,  the 
lower  bound  holds  for  n  as  well.  □  Theorem  5.3 

Because  the  lower  bound  proof  considers  only  the  time  involved  in  accessing  registers,  the 
lower  bound  holds  for  RAMs  with  more  powerful  instructions,  such  as  boolean  operations  or 
multiplication. 

5.3  Implications  for  Log-cost  RAMs  and  Unit-cost  SRAMs 

The  lower  bound  of  Theorem  5.3  implies  a  lower  bound  on  simulating  a  unit-cost  SRAM  by  a 
log-cost  RAM.  We  present  the  theorem  in  terms  of  pointer  machines  instead  of  unit-cost  RAMs. 

Theorem  5.9  There  is  a  pointer  machine  P  running  in  time  0(n )  such  that  for  any  log-cost 
RAM  R  that  simulates  P  on-line,  R  requires  time  fi((nlog n)/loglog n). 

Proof.  Let  T  be  the  tree  machine  described  in  Theorem  5.3.  Let  P  be  a  pointer  machine 
that  simulates  T.  It  is  straightforward  to  show  that  every  tree  machine  can  be  simulated  by 
a  pointer  machine  in  real  time.  T  runs  in  time  n,  so  P  runs  in  time  0(n).  Now  assume 
there  is  a  log-cost  RAM  that  simulates  P  on-line  in  time  o((n  log  n)/loglog  n).  We  thus  have 
an  on-line  simulation  of  a  tree  machine  of  time  complexity  n  by  a  log-cost  RAM  running  in 
time  o((nlogn)/logiogn).  But  we  know  from  Theorem  5.3  that  the  lower  bound  on  such  a 
simulation  is  fi((n  log  n)/log  log  n);  hence  we  have  a  contradiction.  □ 
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Chapter  6 


Relationships  between  Multidimensional  Turing  Machines  and 

RAMs 


6.1  Simulation  of  Multidimensional  Turing  Machines  by  RAMs 

6.1.1  Simulation  by  Log-cost  RAMs 

By  composing  our  simulation  in  Subsection  5.2.1  of  a  tree  machine  by  a  log-cost  RAM  with 
Reischuk’s  (1982)  simulation  of  a  d-dimensional  Turing  machine  by  a  tree  machine,  we  obtain 
an  on-line  simulation  of  a  d-dimensional  Turing  machine  of  time  complexity  t  by  a  log-cost 
RAM  running  in  time  0((5J,° 8  ‘t  log  f)/loglog  t).  But  we  improve  this  upper  bound  with  a 
direct  simulation. 

Theorem  6.1  Every  d-dimensional  Turing  machine  running  in  time  t(h)  can  be  simulated 
on-line  by  a  log-cost  RAM  running  in  time  O(t(n)(logt(n))1_(1/,<i)(loglogi(n))1/,|i). 

Proof.  We  design  a  log-cost  RAM  R  that  simulates  d-dimensional  Turing  machine  M . 
Since  this  is  an  on-line  simulation,  we  use  the  procedure  of  the  simulation  in  Subsection  5.2.1, 
doubling  t  as  necessary. 

For  simplicity,  assume  M  has  one  worktape;  our  results  generalize  to  d-dimensional  Turing 
machines  with  more  than  one  worktape.  Let  s  =  ((log  0/log  log  t)l!d.  Partition  the  worktape 
of  M  into  boxes  with  side  length  s.  Let  base(i)  be  the  base  cell  in  box  i.  For  every  cell  x  in  a 
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box,  there  are  3d  boxes  that  contain  cells  with  coordinates  that  all  differ  from  the  coordinates 
of  x  by  at  most  s;  i.e.,  there  are  3d  boxes  that  are  within  distance  s  of  cell  x. 

For  box  i,  if  base(i)  =  (t'i,  t2, . . . ,  id),  let  index(i)  =  idtd~l  +  id-\td~2  +  •  •  -  +  H-  R  stores  the 
contents  of  box  i  ( sd  bits)  in  the  register  in  main  memory  with  address  index(i).  Step-by-step 
simulation  is  carried  out  in  the  cache.  R  conducts  the  simulation  in  t/s  phases,  each  of  s  steps  of 
M .  For  each  phase:  R  unpacks  the  contents  of  the  3d  boxes  that  are  currently  within  distance 
s  of  the  worktape  head  (the  head  remains  within  these  boxes  during  the  phase);  R  simulates  M 
for  s  steps;  and  R  packs  the  contents  of  the  cache  back  to  main  memory.  Using  precomputed 
values  of  t,  t2, . . .  ,td~l  ( R  can  compute  each  of  these  values  in  time  0(t)),  R  quickly  computes 
index(i')  from  tndex(i)  when  box  i1  is  adjacent  to  box  t. 

For  each  phase,  R  takes  time  O(\ogt)  to  access  main  memory,  O(logt)  to  compute  the 
address  of  registers  in  main  memory  representing  the  new  blocks  needed  in  the  cache,  O(slog  s) 
to  simulate  s  steps  in  the  cache,  0(srflogs)  to  pack  and  unpack  the  appropriate  registers 
(Lemma  2.3),  and  0(s23)  =  o(t )  to  build  the  appropriate  tables  (Lemma  2.4).  Thus  the  total 
time  for  the  simulation  is: 

(</s)(0(logt)  +  0(s  logs)  +  0{sd  logs))  +  0(s24) 

=  0(((t  log  t)/s)  +  tsd~l  logs) 

=  ^(^logO^^^^ioglogO1^)- 

□ 

6.1.2  Simulation  by  Unit-cost  RAMs  and  SRAMs 

Schonhage  (1980)  proved  that  every  multidimensional  Turing  machine  can  be  simulated  by  a 
unit-cost  SRAM  in  real-time.  Because  a  unit-cost  RAM  can  simulate  a  unit-cost  SRAM  in 
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real-time,  it  is  clear  that  a  unit-cost  RAM  can  simulate  a  d-dimensional  Turing  machine  in 
real-time;  however,  we  can  do  better  by  adapting  a  result  of  Grigor’ev: 

Theorem  6.2  For  t(n)  >  n(log n)l^d,  every  d-dimensional  Turing  machine  running  in  time  t 
can  be  simulated  on-line  by  a  unit-cost  RAM  in  time  0(t(n)/(logt(n))1/,d). 

Proof  sketch.  Grigor’ev  (1979)  presented  an  off-line  version  of  this  result.  He  adapted  the 
simulation  of  a  one-dimensional  Turing  machine  by  a  unit-cost  RAM  (Hopcroft  et  ai,  1975). 
We  briefly  sketch  his  simulation.  Let  M  be  a  d-dimensional  Turing  machine  running  in  time 
t,  and  let  R  be  the  RAM  simulator.  Call  a  box  of  M  of  side  length  c{\ogt)l^d ,  where  constant 
c  <  1,  a  block.  A  d-dimensional  Turing  machine  is  block-respecting  if  its  worktapes  are  divided 
into  blocks  and  its  heads  pass  block  boundaries  only  at  times  that  are  integer  multiples  of 
c(logt)1/rf.  Grigor’ev  showed  that  M  could  be  converted  into  a  block-respecting  machine  M' 
running  in  time  O(t).  We  construct  R  to  simulate  M' . 

R  computes  the  0{tc )  possible  configurations  for  M'.  For  each  of  these  configurations,  R 
simulates  M'  for  c(logt)1^  steps  to  determine  its  next  configuration.  Computing  the  configura¬ 
tions  and  next  configurations  take  time  O(tc(log  The  actual  simulation  of  M'  consists  of 

0(1)  table  lookups  for  every  c(logf)1^  steps  of  A/'.  Block  adjacency  information  is  maintained 
using  the  pyramidal  structure  employed  by  Schonhage  (1980)  in  his  real-time  simulation  of  a 
multidimensional  Turing  machine  by  a  pointer  machine.  This  structure  allows  R  to  determine 
which  blocks  of  A/'  have  already  been  visited  by  worktape  heads.  It  can  be  maintained  by  R 
in  real-time  (that  is,  0(1)  steps  for  each  block  visit). 

Grigor’ev’s  simulation  can  be  converted  into  an  on-line  simulation  using  the  same  proce¬ 
dures  that  Galil  (1976)  used  to  convert  the  simulation  of  Hopcroft  et  nl.  (1975)  to  an  on-line 
simulation.  We  do  not  know  the  value  of  n  or  t(n)  ahead  of  time,  so  we  use  the  procedure  of 
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Subsection  5.2.1,  doubling  t  as  necessary.  Another  problem  is  that  M'  is  block-respecting,  but 
it  may  need  to  read  input  symbols  at  time  steps  that  are  not  integer  multiples  of  c(log  t )'/rf. 
Galil  solved  this  problem  with  a  super  block-respecting  machine,  that  is,  a  machine  that  is 
block-respecting  and  only  reads  inputs  at  a  time  which  is  a  multiple  of  c(log  t)l^d.  Galil  showed 
that  M'  could  be  modified  so  that  it  was  super  block-respecting  and  still  run  in  time  0(t). 
This  modification  involved  introducing  appropriate  delays  between  inputs.  The  same  technique 
works  in  this  case.  Hence  we  have  R  simulate  the  super  block-respecting  version  of  Mr ,  and  we 
now  have  an  on-line  simulation.  □ 

6.2  Simulation  of  RAMs  by  Multidimensional  Turing  Machines 
6.2.1  Simulation  of  Log-cost  RAMs 

Loui  (1983)  provided  an  upper  bound  on  simulating  log-cost  RAMs  by  multidimensional  Turing 
machines:  he  showed  that  every  log-cost  RAM  of  time  complexity  t  can  be  simulated  on-line 
by  a  d-dimensional  Turing  machine  in  time  0(t1+^^d^ /log  t). 

We  can  prove  that  there  is  a  log-cost  RAM  R  running  in  time  t  such  that  every  d- 
dimensional  Turing  machine  requires  time  fi((t1+(1/,‘i)(loglogt)1+(1/,'h)/(log/)2+(1/,‘i))  to  sim¬ 
ulate  R  on-line.  Suppose,  to  the  contrary,  that  every  log-cost  RAM  can  be  simulated  on-line 
in  time  o((t1+(1/'i)(log log  f)1+(1/‘^)/(log  t )2+(1/<0).  Combining  this  simulation  with  the  optimal 
on-line  simulation  of  tree  machines  by  log-cost  RAMs  outlined  in  Subsection  5.2.1,  we  obtain 
a  simulation  of  tree  machines  by  d-dimensional  Turing  machines.  Applying  this  simulation 
to  the  real-time  tree  machine  T  described  in  Lours  (1983)  proof  of  a  lower  bound  on  on-line 
simulation  of  tree  machines  by  multidimensional  Turing  machines,  we  obtain  a  d-dimensional 
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Turing  machine  that  simulates  T  on-line  in  time  o(n1+^I/,d^/log  n),  which  contradicts  the  lower 
bound  established  by  Loui. 

Here  w'e  derive  a  stronger  lower  bound  for  simulating  log-cost  RAMs  by  multidimensional 
Turing  machines.  We  first  introduce  Lemma  6.3.  This  result  relies  on  the  fact  that  there  is  a 
fixed  constant  c  such  that  for  all  binary  strings  a  and  r, 

K{c)  <  2\a\  +  c  and  A' (a)  <  A’(cr|r)  +  K(t)  +  c, 

where  K()  denotes  Kolmogorov  complexity,  defined  in  Subsection  5.2.2.  The  constant  c  is  used 
in  the  lemma. 

Lemma  6.3  (Loui,  1983)  Let  g  >  1  and  let  a  be  an  incompressible  string  of  length  n  >  8(c  +  g). 
For  every  set  of  g  strings  { 7-j ,  r2 , . . .  ,rg}  each  of  length  at  most  n/(4g),  A'(cr|ri#r2#  •  •  •  rg)  > 
n/4. 

Theorem  6.4  There  is  a  log-cosi  RAM  R  running  in  time  0(n),  where  n  is  the  input  length, 
such  that  for  any  d-dimensional  Turing  machine  M  that  simulates  R  on-line,  M  requires  time 
O(n1+(1/<0 /(log  n(log  log  7i)1+(i/<0 )). 

Proof.  As  in  Loui  (1983),  we  construct  a  hard  input  string  consisting  of  a  filling  part  and  a 
query  part.  The  filling  part  comprises  an  incompressible  binary  string  x  of  length  rc/(21og  log  n). 
Delimiters  are  added  to  x  so  that  R  can  easily  read  x  in  pieces  of  length  (log  n)/ 2.  This  string 
is  followed  by  some  ‘‘dummy  bits”  to  pad  out  the  filling  part  so  that  its  total  length  is  n/2.  R 
processes  the  filling  part  in  cycles.  In  each  cycle,  R  reads  the  next  (logn)/2  bits  into  a  cache 
and  packs  these  bits  into  one  register  of  main  memory;  so  R  eventually  writes  x  into  the  first 
n/(log  n  log  log  n)  registers  of  main  memory. 
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R  computes  the  tables  necessary  for  packing  after  it  reads  the  first  piece  of  x.  By  maintaining 
a  counter  during  the  reading  of  the  first  piece,  R  can  determine  the  value  of  (logn)/2  and  use 
this  value  to  build  the  tables. 

R  takes  time  O(log  n  log  log  n)  to  determine  the  value  of  (log  n)/‘2.  By  Lemma  2.4  with 
u  =  (logn)/2,  R  takes  o(n)  time  to  precompute  the  tables  necessary  for  packing.  In  each  cycle, 
R  takes  0(lognloglog  n)  time  for  packing  (Lemma  2.3)  and  O(logn)  time  for  access  to  one 
of  the  first  n/(lognloglogn)  registers  of  main  memory.  Each  of  n/flog  nloglogn)  cycles  takes 
time  O(lognloglogn),  so  the  filling  part  takes  time  0(n). 

A  question  for  R  is  a  string  of  the  form  a$b,  where  a  is  a  binary  string  of  length  log  n  that 
specifies  an  address  in  main  memory,  and  b  is  a  binary  string  of  length  log  log  n  that  specifies 
the  position  of  a  particular  bit  within  that  register.  To  process  question  aSb,  R  accesses  register 
r(a)  in  main  memory  and  outputs  the  bit  at  position  b  in  (a),  the  contents  of  r(a),  without 
changing  the  contents  of  main  memory.  After  R  reads  a  into  the  accumulator,  R  uses  an  indirect 
read  to  obtain  (a)  and  unpacks  (a)  into  the  cache.  R  then  reads  b  into  the  accumulator,  accesses 
the  register  in  the  cache  corresponding  to  position  6,  and  outputs  that  bit.  The  time  to  read  a 
and  b  into  the  accumulator  and  to  unpack  the  contents  of  r(a)  is  O(log  n  log  log  n),  so  R  takes 
time  0(log  n  log  log  n)  on  one  question. 

The  query  part  is  a  sequence  of  n/(lognloglog  n)  questions  Qi,Qi,  ■  ■  ■,  so  R  takes  time 
O(n)  to  process  the  query  part.  Now  we  show  how  to  choose  questions  so  that  M  spends  time 
fi((n/Iog  log  n)1^)  to  process  each  Qr 

'  jt  M  have  h  access  heads  on  one  worktape.  For  j  >  1,  consider  the  configuration  of 
At  immediately  before  reading  the  first  symbol  of  Q}.  Let  Z?i,....Z?/,  be  the  boxes  of  side 
length  (n/(Ac'h  log  log  n))1^  centered  at  the  heads  in  this  configuration,  where  constant  d 
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depends  on  M  and  is  chosen  later.  These  boxes  hold  all  cells  accessed  by  M  during  the  next 
(nl{Ac'h\og\ogn))xld /2  steps.  Let  t/,  be  a  binary  encoding  of  the  contents  of  B,  and  z,  be 
a  binary  encoding  of  the  relative  position  of  access  head  i  in  B,.  Let  \B,\  be  the  volume 
of  B{,  and  let  |y,|  and  |z,|  be  the  respective  lengths  of  y,  and  z,-.  For  c'  sufficiently  large, 
|y,|  <  c'\Bi\  and  |z,|  <  c'| for  every  i.  If  M  could  process  every  possible  question  Qj  with 
the  heads  remaining  in  Bi  U  •  •  •  1)  B^,  then  from  the  string  yi #  •  •  ■  #2/a#^i#  •  •  •  #Zh,  only  a 
small  constant  amount  of  additional  information  (a  binary  description  of  this  discussion)  would 
be  necessary  to  generate  x.  Thus  /v(z|yi#  •  •  •  #y/,#zi#  ■  ■  ■  #z/l)  =  0(1),  but  by  Lemma  6.3, 
A'(x|yi  #  •  •  •  #y/i#zi#  •  •  •  #zh)  >  n/(  8  log  log  n),  and  we  have  a  contradiction. 

Therefore,  since  for  each  j  there  exists  a  Qj  such  that  some  head  spends  time 
Q(n/(4h\og\ogn)y/d/2  to  exit  B\  U  •  •  •  U  Bh  when  M  processes  question  Qj,  the  time 
spent  by  M  on  the  query  part  is  at  least  (n/(log  n  log  log  n))(n/(4/i  loglog  n))1/d/2  = 
^(n^^^l/^og  n(loglog  n)1+l1/dl)).  □ 

6.2.2  Simulation  of  Unit-cost  RAMs 

Wagner  and  Wechsung  (1986)  show  that  every  unit-cost  RAM  of  time  complexity  t  can  be 
simulated  by  a  d-dimensional  Turing  machine  in  time  0(t2+(1/d')).  We  improve  their  result  with 
the  following  theorem. 

Theorem  6.5  For  d  >  2,  every  unit-cost  RAM  of  time  complexity  t(n)  can  be  simulated  on-line 
by  a  d-dimensional  Turing  machine  in  time  0(t(n)2  log  t(n)). 

Proof.  We  design  a  d-dimensional  Turing  machine  M  that  simulates  unit-cost  RAM  R.  Since 
we  do  not  know  n  or  t(n)  ahead  of  time,  we  use  the  procedure  of  Subsection  5.2.1,  doubling  t 
as  necessary. 
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Each  register  accessed  by  R  is  represented  on  the  d-dimensional  tape  by  a  box  of  cells  called 
a  record.  We  refer  to  the  coordinates  of  the  base  cell  of  a  record  as  the  coordinates  of  the 
record.  Conceptually,  records  correspond  to  nodes  in  a  height-balanced  binary  tree.  A  depth- 
first  (preorder)  traversal  of  the  tree  would  produce  a  list  of  the  records  sorted  by  the  addresses 
of  the  registers  that  the  records  represent. 

Each  record  x  contains  a  register  address,  register  contents,  coordinates  of  the  parent  of  x  (in 
binary),  coordinates  of  the  left  and  right  children  of  x  (in  binary),  and  balancing  information. 
On  a  separate  worktape,  M  keeps  the  coordinates  of  the  next  unused  record  available  for 
insertion  into  the  tree. 

In  t  steps,  R  accesses  at  most  t  registers,  so  M  creates  at  most  t  records.  Thus  Af  keeps 
the  contents  of  all  registers  used  in  the  computation  of  R  in  a  large  box  called  the  storage  area. 
The  storage  area  consists  of  at  most  t  records.  Let  the  volume  of  the  storage  area  be  V. 

Since  a  unit-cost  RAM  can  double  the  contents  of  a  register  at  each  step,  after  t  steps, 
the  maximum  value  in  any  register  (and  the  maximum  register  address)  is  2* ,  which  can  be 
represented  by  t  bits.  The  coordinates  of  any  record  can  be  represented  using  O(logy)  bits, 
and  the  balancing  information  requires  only  a  constant  number  of  bits.  Thus  each  record  is  a 
box  of  volume  0(t  +  log  V).  Since  there  are  at  most  t  records,  V  =  0(t 2  -f  t  log  V).  We  want  to 
make  the  volume  of  the  storage  area  as  small  as  possible,  so  choose  V  =  0(t2).  The  side  length 
of  the  storage  area  is  0(t2^d). 

To  move  to  record  x  whose  coordinates  are  specified  in  record  y,  M  writes  the  coordinates 
of  record  x  on  a  separate  worktape.  M  then  uses  the  procedure  of  Lemma  2.1,  so  the  time  to 
move  a  worktape  head  to  record  x  is  0(t2^d). 


Each  step  of  R  consists  of  0(1)  accesses  to  memory  .  To  simulate  an  access  to  register  r(a), 
M  performs  a  search  through  the  tree  of  records.  Since  the  tree  is  height-balanced,  M  visits 
O(logt)  records.  For  each  record  x  visited,  M  accesses  0(t)  cells  to  compare  a  with  the  register 
address  in  x.  The  time  to  move  the  head  to  the  next  record  is  0(t2^d).  The  time  to  access  the 
contents  portion  of  a  record  is  0(t).  The  time  to  perform  an  arithmetic  operation  on  operands 
of  length  0(t )  is  0{t).  Since  d  >  2,  the  total  time  taken  for  each  accessed  record  is  0(t).  To 
keep  the  tree  height-balanced  requires  adjustments  to  O(logt)  records.  Thus  the  time  for  each 
step  of  R  is  O(tlogt).  The  total  time  for  the  simulation  is  0(t2logt).  □ 

Theorem  6.6  There  is  a  unit-cost  RAM  R  running  in  time  n  such  that  for  any  d-dimensional 
multihead  Turing  machine  M,  M  requires  time  Q(n1+(1/,rf)/log  n)  to  simulate  R  on-line. 

Proof.  Let  T  be  the  tree  machine  described  in  the  lower  bound  proof  of  Loui  (1983).  Let 
R  be  a  unit-cost  RAM  that  simulates  T  in  real  time  (Theorem  5.1).  Since  T  runs  in  time  n, 
R  runs  in  time  0(n).  Now  assume  there  is  a  d-dimensional  Turing  machine  that  simulates  R 
on-line  in  time  o(n1+(1/,rfl/log  n).  We  thus  have  an  on-line  simulation  of  a  tree  machine  of  time 
complexity  n  by  a  d-dimensional  Turing  machine  running  in  time  o(nl+^^d^ /\ogn).  But  we 
know  from  Loui  (1983)  that  the  lower  bound  on  such  a  simulation  is  f^n^'^/log  n);  hence 
we  have  a  contradiction.  □ 

6.2.3  Simulation  of  Unit-cost  SRAMs 

Wagner  and  Wechsung  ( 19SG)  describe  a  simulation  of  a  unit-cost  SRAM  of  time  complexity  t  by 
a  d- dimensional  Turing  machine  in  time  0(t1+(1/,tihog  t).  We  present  t  ho  following  improvement: 

Theorem  6.7  For  d  >  2,  every  unit-cost  successor  RAM  of  tune  complexity  t(n)  can  be  sim¬ 
ulated  on-line  by  a  d-dimensional  Turing  machine  in  time  0(t(n  )l+^'/^(log  ffn))1^). 


Proof.  We  design  a  d-dimensional  Turing  machines  M  that  simulates  pointer  machine  P. 
The  theorem  follows  from  real-time  equivalence  of  pointer  machines  and  unit-cost  SRAMs. 
Since  we  do  not  know  n  or  t(n)  ahead  of  time,  we  again  employ  the  technique  of  repeatedly 
doubling  t  and  starting  over  as  necessary.  By  Lemma  4.13,  we  can  assume  that  P  has  a  pointer 
alphabet  of  size  2. 

As  in  the  simulation  of  Theorem  6.5,  M  maintains  information  about  each  node  of  P  in  a 
record,  a  box  of  volume  O(logt).  Let  record  i  represent  the  ith  node  created  by  P.  The  record 
whose  base  cell  is  the  origin  contains  thg  coordinates  of  the  record  representing  the  center  node 
and  the  coordinates  of  the  next  available  record  (to  represent  the  next  node  to  be  created  by  P). 
For  every  other  record  in  M,  record  i  contains  the  coordinates  of  the  records  representing  the 
two  nodes  pointed  to  by  node  i  in  P.  Since  P  creates  at  most  t  nodes,  M  maintains  information 
about  all  nodes  in  a  large  box  of  volume  0(t  logt)  and  side  length  0((t  log  t)1^).  Call  this  large 
box  the  storage  area. 

Each  step  of  R  consists  of  0(1)  accesses  to  nodes  in  the  A  structure.  M  moves  a  worktape 
head  to  a  record  in  the  storage  area  as  described  in  Lemma  2.1,  taking  time  proportional  to  the 
side  length  of  the  storage  area.  Thus  each  access  can  be  simulated  by  M  in  time  0((t  log  t  )l^d). 
So  the  entire  simulation  takes  time  0(t(t  log  t)l^d).  □ 

Theorem  6.8  There  is  a  unit-cost  successor  RAM  R  running  in  time  0(n)  such  that  for 
any  multihead  d-dimensional  Turing  machine  M  that  simulates  R  on-line.  M  requires  time 
fl(  n1  +fl/,t^/log  n). 

Proof  sketch.  It  is  straightforward  to  show  that  every  tree  machine  can  be  simulated  by  a 
pointer  machine,  and  thus  by  a  unit-cost  SRAM,  in  real  time;  therefore  we  can  apply  the  proof 
technique  of  Theorem  6.6.  We  assume  the  contrary  and  present  an  on-line  simulation  of  a  tree 
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machine  of  time  complexity  n  by  a  d-dimensional  Turing  machine  in  time  o(nl+(l/d)  /log  n), 
which  contradicts  the  lower  bound  result  of  Loui  (1983).  □ 
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Chapter  7 


Open  Problems 


7.1  Improving  Pack  and  Unpack  Routines  for  Log-cost  RAMs 

We  have  used  the  pack  and  unpack  routines  of  Katajainen  et  al.  (1988)  for  several  results.  These 
routines  take  time  0(b2b  -f  b  log  b)  time  on  a  RAM.  A  good  question  is  whether  these  routines 
can  be  improved;  that  is,  whether  a  log-cost  RAM  can  compute  the  6-bit  representation  of  an 
integer  n  <  24,  or  the  numerical  value  of  a  6-bit  string,  in  0(6 log  6)  time.  Clearly,  if  the  6-bit 
representation  is  to  be  in  6  registers,  then  fl(61og6)  time  is  necessary  to  access  and  read  each 
of  6  different  registers;  but  is  it  possible  to  improve  the  0(624)  time  needed  to  create  the  tables 
used  in  the  routines? 

We  can  extend  this  packing  problem  as  follows:  How  fast  can  a  log-cost  RAM  pack  6  registers 
each  containing  w  bits  into  a  single  register?  Call  this  the  6  —  w  packing  problem.  It  is  evident 
that  ft(61og6  +  bw)  time  is  necessary,  since  the  RAM  needs  fi(61og6)  time  to  write  down  the 
addresses  of  6  distinct  registers  and  Q(bw)  time  to  access  6  registers  each  containing  xv  bits.  We 
can  adapt  the  algorithms  of  Katajainen  et  al.  (1988)  to  show  that  the  6  —  w  packing  problem 
can  be  solved  in  time  0{bw2bw  -f  6  log  6  +  6u;);  the  only  major  change  is  to  ensure  that  the 
proper  values  are  contained  in  the  first  two  registers  of  the  origin  table.  The  0(bu'2bw)  term  is 
needed  to  construct  the  origin ,  Ishift,  and  rshift  tables. 

If  the  6  -  w  packing  problem  could  be  solved  in  time  0(6  log  6  +  bw),  then  storing  arbitrary 
n-bit  inputs  would  take  O(nlog*n)  time  on  a  log-cost  RAM.  This  upper  bound  would  match 
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the  lower  bound  on  storing  n-bit  inputs  presented  by  Schonhage  (1988).  Let  pack(b,w,x)  pack 
into  AC  the  contents  of  the  b  registers  starting  at  r(x)  with  each  register  containing  w  bits. 
The  following  algorithm,  nstore,  stores  an  n-bit  input  in  optimal  time,  if  the  b  —  w  packing 
problem  is  solvable  in  time  0(6 log b  +  bw): 
procedure  nstore(n,a) 

*  stores  an  n-bit  number  from  input  into  a  single  register  (n  +  a  -  1)  in  memory 

if  n  =  1  then 

read  input  bit 

store  bit  in  register(n  —  1) 

else 

for  t  :=  1  to  n/logn  do 
nstore(logn,i) 

endfor 

pack(  n /log  n  Jog  n  Jog  n ) 

store  contents  of  accumulator  in  register  (n  -f  a  —  1) 
endif 
end  nstore 


Let  t(n)  be  the  time  necessary  for  a  log-cost  RAM  R  to  execute  nstore  on  an  n-bit  number. 
For  an  n-bit  number,  R  takes  time  (n/log  n)t(log  n)  to  make  the  recursive  calls  to  nstore.  If 
the  b  -  w  packing  problem  can  be  solved  in  time  0(6  log  6  +  bw),  then  the  call  to  pack  takes 
time  0((n/ log  n)log  n).  Access  to  memory  takes  time  0( log  n).  Thus  analysis  of  this  algorithm 
gives  us  (for  some  constants  k\ , . . . ,  k4): 

t(  1)  =  k i 

t(n)  =  (n/log  n)t(log  n)  +  L'2((n/log  n)  log  n)  -f  k$  log  n 
=  (n/log  n)<(log  n)  +  k4n, 

so  t(n)  =  0(n  log*n). 

Another  efTect  of  efficient  6  -  w  packing  would  be  a  strengthening  of  the  lower  bound  on  the 
on-line  simulation  of  a  log-cost  RAM  by  a  d-dimensional  Turing  machine.  By  Theorem  6.4,  we 
have  a  lower  bound  of  fl(n1+^I/,!^/(Iog n(log  log  n)1  Theorem  6.4  used  an  incompressible 
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string  of  length  n/(21oglog n),  and  the  query  part  comprised  only  0(n/(log  nloglog  n))  ques¬ 
tions.  With  efficient  b  —  w  packing,  we  could  use  an  incompressible  string  of  length  n/(21og*n), 
and  the  query  part  could  consist  of  O(n/(log  n  log*n))  questions.  The  resulting  lower  bound 
would  be  f2(n1+(1/li)/(logra(log*n)1+(1/,d))). 

7.2  Time  versus  Space  and  Determinism  versus  Alternation 

One  goal  in  our  research  has  been  to  reproduce  results  for  pointer  machines  already  known 
for  Turing  machines.  We  presented  time  and  space  hierarchy  theorems  and  several  complexity 
results  for  nondeterministic  and  alternating  pointer  machines. 

There  are  still  several  open  questions  in  this  area.  As  we  have  mentioned,  we  would  like 
to  duplicate  the  time  vs.  mass  result  of  Halpern  et  al.  (1986)  for  the  capacity  space  measure 
for  pointer  machines;  i.e. ,  we  would  like  to  to  show  that  every  pointer  machine  running  in  time 
t  can  be  simulated  by  a  pointer  machine  running  in  capacity  0(t/\ogt).  Such  a  result  would 
be  important  for  at  least  two  reasons.  It  would  provide  further  evidence  that  capacity  is  the 
proper  measure  of  space  complexity  for  pointer  machines,  and  it  would  further  establish  the 
machine-independence  of  the  time  vs.  space  result  for  Turing  machines  (Hopcroft  et  al.,  1977). 
We  believe  that  this  result  is  possible, but  it  appears  that  the  approach  taken  by  Halpern  et  al. 
will  not  provide  the  desired  result. 

One  approach  is  to  answer  another  open  problem  about  pointer  machines.  Dymond  and 
Tompa  (1985)  showed  that  every  deterministic  Turing  machine  running  in  time  t  can  be  sim¬ 
ulated  by  an  alternating  Turing  machine  running  in  time  t/\ogt.  We  would  like  to  show  that 
adding  the  property  of  alternation  to  pointer  machines  yields  a  similar  result;  namely,  that  ev¬ 
ery  deterministic  pointer  machine  running  in  time  t  can  be  simulated  by  an  alternating  pointer 
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machine  running  in  time  0(Z/logZ).  We  believe  that  this  is  possible,  although  proving  it  may 
be  difficult. 

Suppose  PM-TIME(Z)  C  APM-TIME(0(Z/log  Z)).  Then  because  APM-TIME(Z) 
C  PM-CAPACITY(0(Z))  (Theorem  4.17),  it  would  follow  that  PM-TIME(Z)  C  PM- 
CAPACITY(0(Z/logZ)).  Thus  this  alternating  vs.  deterministic  pointer  machine  time  result 
would  imply  the  desired  pointer  machine  time  vs.  capacity  result. 

The  obvious  approach  to  showing  that  alternating  pointer  machines  are  faster  than  deter¬ 
ministic  pointer  machines  is  to  adapt  the  proof  of  Dymond  and  Tompa.  The  difficulty  is  that 
the  key  to  their  proof  is  a  pebble  game  on  the  computation  graph  of  a  Turing  machine.  A 
similar  computation  graph  for  a  pointer  machine  does  not  seem  to  have  the  necessary  proper¬ 
ties  to  exploit  the  same  pebble  game.  It  may  be  that  the  pointer  machine  computation  graph 
must  be  constructed  in  a  more  clever  manner,  or  perhaps  some  other  properties  of  the  more 
straightforward  computation  graph  can  be  used  to  prove  this  result.  Another  approach  is  to 
avoid  explicit  construction  of  the  computation  graph  altogether  as  in  Halpern  et  al.  (1986). 

7.3  Lower  Bound  on  Simulation  of  Multidimensional  Turing  Machines 

We  are  attempting  to  find  a  tight  lower  bound  for  the  on-line  simulation  of  multidimensional 
Turing  machines  by  log-cost  RAMs.  We  believe  that  a  lower  bound  of  fl(Z(log  z)1-^1/^) 
is  possible.  Although  this  lower  bound  would  not  match  the  current  upper  bound  of 
0(Z(log  Z)I-(,/lrf)(log  log  t)ltd)  (Theorem  6.1),  it  would  provide  more  insight  into  the  possibilities 
and  limitations  of  dynamic  representation  of  arrays  within  a  set  of  registers. 

Our  current  approach  to  this  problem  is  to  use  Kolmogorov  complexity,  as  in  Theorem  5.3, 
where  we  established  a  lower  bound  on  simulating  a  tree  machine  by  a  log-cost  RAM.  We 
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describe  a  d-dimensional  machine  M  running  in  real  time  that  we  believe  cannot  be  simulated 
by  any  log-cost  RAM  in  time  o(t(log 

Again  we  have  a  filling  part  and  a  query  part.  For  the  filling  part,  M  fills  a  region  A  of  its 
worktape  with  an  incompressible  string  x  oflength  0(n )  such  that  the  distance  from  the  origin 
to  any  cell  on  the  boundary  of  A  is  0(nl/d).  A  question  drives  the  worktape  head  from  the 
origin  to  a  boundary  cell  of  A,  and  back  to  the  origin.  The  only  other  movement  restriction 
on  the  head  during  a  question  is  that  when  the  head  is  moving  toward  the  boundary  of  A, 
it  must  always  move  away  from  the  origin.  Thus  a  question  takes  time  0{nx^d)  on  M .  The 
query  part  consists  of  0{n}~^l^d^)  questions.  Our  goal  is  to  show  that  for  every  RAM  R  that 
simulates  M,  after  i  questions,  there  is  always  an  ( i  -f  l)st  question  on  which  R  takes  time 
fl(n1/J(log  n)1-^1/^).  The  argument  below  leads  us  to  believe  that  such  a  question  exists. 

We  construct  a  question  q  as  follows:  RAM  R  begins  processing  q  by  accessing  register  ri. 
We  choose  the  initial  portion  of  q  so  that  R  is  forced  to  access  some  different  register  r2  as  soon 
as  possible.  We  then  choose  the  next  portion  of  q  so  that  R  must  access  another  register  as 
soon  as  possible.  We  continue  in  this  manner  until  we  have  constructed  the  entire  question. 

Say  R  accesses,  in  order,  registers  rj,  r2, . . .  ,rm  to  process  q.  Let  a,  be  the  portion  of 
the  incompressible  string  x  output  by  R  after  R  accesses  r,  but  before  R  accesses  r,+1.  For 
i  =  l,...,m,  let  be  the  length  of  a,.  Note  that  f,  =  nl^d.  Since  we  chose  q  so  that 
it  forced  an  access  to  a  new  register  as  soon  as  possible,  a,  must  be  the  shortest  portion  of  x 
that  R  could  output  on  this  access  to  r,.  Thus  the  length  of  the  contents  of  r,  at  the  time  it  is 
accessed  (call  this  length  £,)  is  (s^e  Figure  7.1). 
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Figure  7.1:  Region  of  worktape  that  r,  could  process 

Let  m*  be  the  number  of  distinct  registers  among  Tq,  7*2, . . . ,  rm.  The  time  tq  for  R  to  answer 
the  question  q  is  fl(m*logm*)  to  specify  m’  distinct  register  addresses  plus  fi(52£Li  (£,)**)■  By 
Jensen’s  inequality,  (4)d  >  m((H£Li  (^i))/m)d  —  m(nlld /m)d . 

We  would  like  to  be  able  to  show  that  among  all  the  questions  we  could  construct  in  the 
above  manner,  there  is  at  least  one  question  where  m  =  mm\  i.e.,  where  all  registers  accessed 
are  distinct.  It  might  be  possible  to  show  that  if  such  a  question  does  not  exist,  then  some 
group  of  registers  specify  more  than  their  share  of  information  about  x,  hence  the  string  x  is 
compressible.  If  we  could  prove  the  existence  of  such  a  question,  then  we  could  claim: 

tq  >  m  log  m  +  m(n1/,<i / m)d  =  m  log  m  +  nml~d. 

The  right  side  is  minimized  when  m  =  (n/log  n)1^.  Thus  tq  —  fl((n1//<i(log  n)1-*1/^).  This 
would  give  us  our  lower  bound. 

Unfortunately,  we  have  not  yet  been  able  to  apply  an  incompressibility  argument  to  prove 
the  existence  of  a  question  where  m  —  m*,  nor  do  we  know  whether  such  an  argument  is 
possible.  We  may  need  to  focus  on  some  other  aspect  of  the  construction  of  the  questions.  In 
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any  case,  we  believe  this  approach  or  one  similar  will  yield  the  desired  lower  bound,  and  we 
shall  continue  to  investigate  the  problem. 
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